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Series Preface 


As any human activity needs goals, mathematical research needs problems. 
—David Hilbert 


Mechanics is the paradise of mathematical sciences. 
—Leonardo da Vinci 


Mechanics and mathematics have been complementary partners since New- 
ton’s time, and the history of science shows much evidence of the beneficial 
influence of these disciplines on each other. Driven by increasingly elabo- 
rate modern technological applications, the symbiotic relationship between 
mathematics and mechanics is continually growing. However, the increasingly 
large number of specialist journals has generated a duality gap between the 
partners, and this gap is growing wider. 

Advances in Mechanics and Mathematics (AMMA) is intended to bridge 
the gap by providing multidisciplinary publications that fall into the two 
following complementary categories: 


1. An annual book dedicated to the latest developments in mechanics 
and mathematics; 

2. Monographs, advanced textbooks, handbooks, edited volumes, and 
selected conference proceedings. 


The AMMA annual book publishes invited and contributed comprehensive 
research and survey articles within the broad area of modern mechanics and 
applied mathematics. The discipline of mechanics, for this series, includes 
relevant physical and biological phenomena such as: electromagnetic, ther- 
mal, and quantum effects, biomechanics, nanomechanics, multiscale model- 
ing, dynamical systems, optimization and control, and computation methods. 
Especially encouraged are articles on mathematical and computational mod- 
els and methods based on mechanics and their interactions with other fields. 
All contributions will be reviewed so as to guarantee the highest possible sci- 
entific standards. Each chapter will reflect the most recent achievements in 


xi 
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the area. The coverage should be conceptual, concentrating on the method- 
ological thinking that will allow the nonspecialist reader to understand it. 
Discussion of possible future research directions in the area is welcome. 
Thus, the annual volumes will provide a continuous documentation of the 
most recent developments in these active and important interdisciplinary 
fields. Chapters published in this series could form bases from which possible 
AMMA monographs or advanced textbooks could be developed. 

Volumes published in the second category contain review/research contri- 
butions covering various aspects of the topic. Together these will provide an 
overview of the state-of-the-art in the respective field, extending from an in- 
troduction to the subject right up to the frontiers of contemporary research. 
Certain multidisciplinary topics, such as duality, complementarity, and sym- 
metry in mechanics, mathematics, and physics are of particular interest. 

The Advances in Mechanics and Mathematics series is directed to all sci- 
entists and mathematicians, including advanced students (at the doctoral 
and postdoctoral levels) at universities and in industry who are interested in 
mechanics and applied mathematics. 


David Y. Gao 
Ray W. Ogden 


Preface 


Complementarity and duality are closely related, multidisciplinary topics that 
pervade all natural phenomena, and form the basis for solving many under- 
lying nonconvex analysis and global optimization problems that arise in sci- 
ence and engineering. During the last forty years, much research has been 
devoted to the development of mathematical modeling, theory, and compu- 
tational methods in this area. The field has now matured in convex systems, 
especially in linear programming, engineering mechanics and design, mathe- 
matical physics, economics, optimization, and control. However, in nonconvex 
systems many fundamental problems still remain unsolved. 

In view of the importance of complementarity—duality theory and meth- 
ods in applied mathematics and mathematical programming, and in order 
to bridge the ever-increasing gap between global optimization and engineer- 
ing science, the First International Conference on Complementarity, Dual- 
ity, and Global Optimization (CDGO) was held at Virginia Tech, Blacksburg, 
August 15-17, 2005, under the sponsorship of the National Science Founda- 
tion. This conference brought together more than 100 world-class researchers 
from interdisciplinary fields of industrial engineering, operations research, 
pure and applied math, engineering mechanics, electrical engineering, psy- 
chology, management science, civil engineering, and computational science. 
This conference spawned some new trends in optimization and engineering 
science, and has stimulated young faculty and students to venture into this 
rich domain of research. 

This AMMA Annual contains eleven chapters from selected lectures pre- 
sented at the First International Conference on Complementarity, Duality, 
and Global Optimization (CDGO) in August 2005 and three invited chapters 
by experts in computational mathematics and quantum computations. These 
chapters deal with fundamental theory, methods, and applications of comple- 
mentarity, duality, and global optimization in multidisciplinary fields of global 
optimization, nonconvex mechanics, and computational science, as well as the 
very contemporary topic of quantum computing, which is at the forefront of 
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the scientific and technological research and development of the twenty-first 
century. 

This special volume is dedicated to Gilbert Strang on the occasion of his 
70th birthday. Professor Strang is a world-renowned mathematician not only 
by his scientific contributions, but also by his personal character which exem- 
plifies what a real scientist should possess. During his exceptional academic 
career and social activities spanning almost a half-century, Dr. Strang has 
had a profound influence on the development of interdisciplinary fields in 
applied mathematics, mechanics, and engineering science, including the field 
of complementary duality in calculus of variations, optimization, numerical 
methods, and mathematical education. The unified beauty of duality can be 
viewed throughout his celebrated textbooks, lecture notes, essays, and scien- 
tific publications, which will continue to influence several generations in the 
broad field of mathematical sciences. 

Credit for this special volume is to be shared by all the eminent contribut- 
ing authors. As editors, we are deeply indebted to them. Our special thanks 
also go to Ann Kostant and her team, and especially to Elizabeth Loew at 
Springer, for all their great enthusiasm and professional help in expediting 
the publication of this annual volume. 


May 2008 David Y. Gao 
Blacksburg, VA Hanif D. Sherali 


Constrained Optimism 


Gilbert Strang 


The editors have kindly invited me to write a few words of introduction to this 
volume. They even expressed the hope that I would go beyond mathematics, 
to say something about my own life experiences. I think every reader will 
recognize how hard it is (meaning impossible) to do that properly. If I choose 
a single word to describe an approach to the complications of life (and of 
mathematics too), it would be “optimism.” Eventually I realized that, if you 
allow that word in its mathematical sense too, this whole book is for optimists. 

If I may give one instance of my own optimism, it has come from writing 
textbooks. I enjoy the hopeless effort to express simple ideas clearly. Beyond 
that, I have come to expect (without knowing any reason, perhaps this defines 
an optimist) that the connections between all the pieces of the book will 
somehow appear. Suddenly a topic fits into its right place. This irrational 
certainty may also be the experience of a hopeful novelist who doesn’t know 
how the characters will interact and how the plot will turn out. 

Looking seriously at this approach, to applied mathematics or to life, an 
unconstrained optimism is hard to justify. Mathematically, an immediate con- 
straint on all of us is that we are “not Gauss.” Far wiser to accept constraints, 
and continue to optimize. The connection that did finally bring order to my 
own thinking and writing about applied mathematics and computational en- 
gineering was constrained optimization. I now call that the “Fundamental 
Problem of Scientific Computing.” 

Examples are everywhere, or those words would not be justified. So many 
problems involve three steps, and flows in networks are a good model. The 
potentials at the nodes, and the currents on the edges, are the unknowns 
(somehow dual). A first step goes from potentials to potential differences (by 
an edge-node matrix A). The second step relates potential differences to flows 
(by a matrix C). Ohm’s law is typical, or Hooke’s law, or any constitutive 
law: linear at first but not forever. The third step is the essential constraint 
of conservation or continuity or balance of forces, as in Kirchhoff’s current 
law. This involves the transpose matrix A’. 
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The dual role of A and A’ is at first a miracle. A reason begins to emerge 
through minimization and Lagrange multipliers. If we minimize a quadratic 
energy with a linear constraint A’w = f, the optimality conditions lead to a 
saddle point matrix (“K KT matrix”): 


A’ O}lu f 


One way to solve this fundamental problem is to eliminate w. The three 
matrices combine into A’/CA, symmetric and positive definite in the best 
case. This is the stiffness matrix of the finite element method, or the Laplacian 
matrix of finite differences and graph theory. It appears everywhere and we 
don’t know the best way to solve the equation. As a differential equation 
it is in divergence form with A’/CA = div(c grad). When C is piecewise 
linear we have mathematical programming, where the primal—-dual method 
has come to the front. The real problems of mechanics and biology (and 
life) are not linear at all. But remarkably often they still have this form 
with A/C (Au). 

May I thank the editors and authors and readers of the present book. 
I hope you will accept constraints as inevitable, and go forward. 


al 
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Chapter 1 


Maximum Flows and Minimum Cuts 
in the Plane 


Gilbert Strang 


Summary. A continuous maximum flow problem finds the largest t such that 
divv = t F(a,y) is possible with a capacity constraint ||(v1, v2)|| < c(z,y). 
The dual problem finds a minimum cut OS which is filled to capacity by 
the flow through it. This model problem has found increasing application in 
medical imaging, and the theory continues to develop (along with new algo- 
rithms). Remaining difficulties include explicit streamlines for the maximum 
flow, and constraints that are analogous to a directed graph. 


Key words: Maximum flow, minimum cut, capacity constraint, Cheeger 


1.1 Introduction 


This chapter returns to a special class of problems (partial differential equa- 
tions with inequality constraints) in continuous linear programming. They 
describe flow through a domain 2, in analogy with flow along the edges of 
a graph. The flow is maximized subject to a capacity constraint. The key to 
the solution is the dual problem, which looks for a set S C 2 from which no 
more flow is possible. The boundary of S' is the minimum cut, and it is filled 
to capacity by the maximum flow. 

In the discrete case, Kirchhoff’s current law that “flow in = flow out” 
must hold at every interior node of the network. The maximum flow is the 
largest flow from source to sink, subject to Kirchhoft’s equation at the nodes 
and capacity constraints on the edges. This fits the standard framework of 
linear programming, and Kirchhoff’s incidence matrix (of 1s, —1s, and 0s) 
has remarkable properties that lead to an attractive theory. Our purpose is 
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to point to a maximum flow-minimum cut theorem in the continuous case, 
and to introduce new questions. 

The principal unknown is the vector u(x, y) that gives the magnitude and 
direction of the flow. On a plane domain this is v = (vi (x,y), vo(x, y)). The 
analogue of Kirchhoff’s matrix is the divergence operator: 

Ov1 Ove 


Conservation: divv = ae + Ce tF(x,y) in 22. (1.1) 


That source/sink term tF(x, y) might be zero or nonzero in the interior of the 
flow domain 2. There may also be a source term tf(x,y) on the boundary 
O02, in closer analogy with the discrete source and sink in classical network 


flow. With n as the unit normal vector to 022, the (possible) boundary sources 
and sinks are given by a Neumann condition: 


Boundary sources: v-n=tf(x,y) on OQ. (1.2) 


Our examples involve F but not f. We only note the case OQn = OL, 
when f is prescribed on the whole boundary. Then the divergence theorem 
Jf div v dx dy = [ v-nds imposes a compatibility condition on F and f: 


Compatibility: // F(a,y) dx dy = [few ds if O0Qn =O. (1.3) 
Q AQ 


Now comes the key inequality, a limit on the flow. The vector field v(a, y) 
is subject to a capacity constraint, which makes the problem nonlinear. In 
our original paper [31] this constraint measured (v1, v2) always in the @? norm 
at each point: 


Capacity: |v(x,y)| = 4/u? +03 <c(z,y) in 2. (1.4) 
A more general condition would require v(x, y) to lie in a convex set K (a, y): 
u(z,y) € K(a,y) forall a,yin 2. (1.5) 


A typical maximal flow problem in the domain 2 is 
Maximize t subject to (1.1), (1.2), and (1.4). 


In returning to this maximal flow problem, our goal is to highlight four 
questions that were not originally considered. Fortunately there has been 
good progress by several authors, and partial answers are available. But the 
new tools are not yet all-powerful, as we illustrate with a challenge problem 
(uniform source F = 1 and capacity c = 1 with 2 = unit square). This 
continues to resist explicit solution for the velocity vector v: 


Challenge: Maximize ¢ so that divu =¢ with |v| <1 in 2. (1.6) 


1 Maximum Flows and Minimum Cuts in the Plane 3 


The intriguing aspect of this problem is that we can identify the minimal cut. 
Therefore we know the maximal flow factor t = 2+ \/z, from the capacity 
across that cut 0S. Determining OS is a constrained isoperimetric problem 
that is pleasant to solve (and raises new questions). 

What we do not know is the flow vector v inside the square! Optimality 
tells us the magnitude and direction of v only along the cut, described below. 
We apologize for the multiplication of new challenges, when the proper goal 
of a chapter should be new solutions. 


1.2 New Questions and Applications 


The continuous maximal flow problem is attracting a small surge of inter- 
est. We mention recent papers that carry the problem forward in several 
directions: 


Grieser [16] shows how max flow—min cut duality leads to an elegant proof of 
Cheeger’s inequality, giving the lower bound in (1.18) on the first eigenvalue 
of the Laplacian on 2. The eigenfunction has u = 0 on 022, so OQy is empty: 


1 
Cheeger: A, > ae where h(2) = tmax with F = 1. (1.7) 


The Cheeger constant h is found from the constrained isoperimetric problem 
that arises for the minimal cut 0S: 
eee: _._, perimeter of S 
Definition: h(Q2) => dat ” ate or Ss (1.8) 
As in the particular case of our challenge problem, h(2) is often computable. 
For the unit square we note in (1.24) that the inequality (1.7) is far from 
tight. 


2. 


Appleton and Talbot [2] have proposed an algorithm for computing the max- 
imum flow vector v from a sequence of discrete problems. Their motivation is 
to study image segmentation with medical applications (see especially [3, 4]). 
The same techniques are successful in stereo matching [26]. Their paper is rich 
in ideas for efficient computations and an excellent guide to the literature. 
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The algorithm approaches the maximum flow field as T’ — oo, by intro- 
ducing a “Maxwell wave equation” with capacity c and internal source F' = 0: 


OE 


Or = — div v, a = -—grad E, jul <e. (1.9) 


Appleton—Talbot: 
ppleton—Talbo aT 


The potential is E, and the first equation OF /OT = — div v is a relaxation of 
the conservation constraint div v = 0 (Kirchhoff’s law). Appleton and Talbot 
prove that the energy [{(|E|? +|v|?) is decreasing in every subset S of 2. 
At convergence, the optimal cut is the boundary of a level set of FE. 

The equations (1.9) are discretized on a staggered grid. This corresponds 
to Yee’s method (also called the FTDT method) in electromagnetics. The 
algorithm has a weighting function to model the effect of source terms, and 
the experiments with image segmentation are very promising. 

Because primal—dual interior point algorithms have become dominant in 
optimization, we conjecture that those methods can be effective also here in 
the approximation of continuous by discrete maximal flows. 


3. 


Nozawa [23] took a major step in extending the max flow-min cut theorem 
from the simple isotropic condition |v| < 1 in (1.4) toward the much more 
general capacity condition (1.5). This step can be illustrated already in our 
challenge problem, by changing from the @? norm of v(z,y) to the ¢+ or £%° 
norm: 


¢' challenge: Maximize t so that divv = t with |v1| + |ve| <1 in Q. 
(1.10) 


é challenge: Maximize ¢ so that divv =¢t with |v1| < 1,|vg] << lin 2. 
(1.11) 


In the isoperimetric problem (1.8), this changes the definition of perime- 
ter. The dual norm (in this case £° or ¢') becomes the measure of arc- 
length (dx, dy). Then this dual norm enters the computation of |0S|: 


Perimeter (in R*): |05| = f \de,av)) (1.12) 
Os 


The coarea formula from geometric measure theory [12], on which the proof 
of duality rests, continues to apply with the new definition. 

As in the @? case, the maximal t can be computed! So we have new flow 
fields to find, reaching bounds that duality says are achievable. 

It is intriguing to connect maximal flow with the other central problem 
for networks and continua: the transportation problem. This asks for shortest 


1 Maximum Flows and Minimum Cuts in the Plane 5 


paths. The original work of Monge and Kantorovich on continuous flows has 
been enormously extended by Evans [11], Gangbo and McCann [15], Rachev 
and Riischendorf [25], and Villani [33]. 

Our challenge problem requires the movement of material F(x, y) from 2 
to 092. The bottleneck is in moving from the interior of S to the minimal 
cut OS. The distribution of material is uniform in S, and its destination is 
uniform along 0S, to use all the capacity allowed by |v| < 1. How is the 
shortest path (Monge) flow from S' to 0S' related to the maximum flow? 


4. Directed Graphs and Flows. 


Chung [8, 9] has emphasized that Cheeger’s theory (and the Laplacian itself) 
is not yet fully developed for directed graphs. For maximal flow on networks, 
Ford and Fulkerson [13] had no special difficulty when the edge capacities 
depend on the direction of flow. The problem is still a linear program and 
duality still holds. 

For directed continuous flows we lack a correctly formulated duality the- 
orem. The capacity would be a constraint v(z,y) € K(a,y) as in (1.5). 
Nozawa’s duality theorem in [23] quite reasonably assumed that zero is an 
interior point of K. Then a flow field exists for sufficiently small t (the feasible 
set is not empty). The continuous analogue of direction-dependent capacities 
seems to require analysis of more general convex sets K (x,y), when zero is a 
boundary point. In [22], Nozawa illustrated duality gaps when his hypotheses 
were violated. 


Finally we mention that all these questions extend to domains 9? in R”. 
The constrained isoperimetric problems generalize to higher dimensions as 
well as different norms. The one simplification in the plane is the introduction 
of a stream function s(a,y), with (v1, v2) = (0s/Oy, —Os/0x) as the general 
solution to divv = 0. Our survey [30] formulated the corresponding primal 
and dual problems for s(z,y) as Lt and L© approximations of planar vector 
fields, where Laplace’s equation corresponds to L?. 

The remaining sections of this chapter discuss the topics outlined above. 
We compute the minimum cuts in the three versions of the challenge problem 
on the unit square. We also mention an isoperimetric problem (with a different 
definition of perimeter) to which we return in a later paper [82]. 


1.3 Duality, Coarea, and Cheeger Constants 


The maximum flow is limited by the capacity c(x, y): 
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Primal problem: Maximize t subject to 
divv =tF(a,y)in Q, v-n=tf(x,y) on OQNn, |v(a,y)lo < c(x,y) in 2. 


Nozawa’s duality theorem requires a proper choice of function spaces and 
boundary conditions, in this problem and in its dual for u(x, y) in BV({2). 
Where the primal involves the divergence, the dual involves the gradient. 
Kohn and Temam [19] extended Green’s formula ff udivv = — ff v- gradu 
to allow functions u(x, y) of bounded variation. 

We show that the optimal u(a, y) in the dual problem is the characteristic 
function of a set S with finite perimeter. This u(z,y) is not smooth, but it 
lies in BV. The dual problem does not initially ask for a minimum cut. 


Dual problem: Minimize ||u||py.c with @(u) =1 or Minimize i 
u 
|ullBv.e = // c(x, y)| grad ulg dx dy £(u) = [ufas - ff uF da dy. 
Q dQNn Q 
(1.13) 


The key step toward the solution is to recognize the extreme points of the 
unit ball in this weighted BV norm |ul|py.c = Jf c| grad ul2 dx dy. Those 
extreme points are characteristic functions u =, of open subsets S of (2: 


X9(x,y) = {1 for x,y in S, 0 otherwise}. 


The BV norm of yg is the weighted perimeter [ cds of 5’, because the gradient 
is a measure (a line of delta functions) supported only on that boundary OS. 

The coarea formula gives the BV norm of u (weighted by the capacity c) 
as an integral over the norms of characteristic functions of level sets S(t) of u: 


Coarea: lullave = f Iixscollavedt with S(t) = {e,y | ulzu) <t. 


(1.14) 

Consider the case with F' > 0 and no boundary sources f. Specializing 

in (1.13) to the characteristic functions u = yg, our dual problem reduces to 
an isoperimetric problem for S and the minimum cut OS: 


_ {fel grad ulz dx dy _ weighted perimeter tae cds 
min = 


weBV Jf Fudz dy $C weighted area Jig F da dy © 


(1.15) 
Choosing c(x,y) = 1 and F(a,y) = 1, this computes the Cheeger constant. 
Cheeger constant: h(Q) = inf —— (1.16) 


Weak duality h >t is the inequality [cds > t [{ F dx dy for every feasible t 
in the primal problem. This is just the divergence theorem when divu = 
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tF(a,y) and |u| < 1: 


Weak duality Jews > Jonas = [[civvacay =t | f Pavdy. (1.17) 
het: as s 


Os S 


Duality says that equality holds for the maximal flow v and the minimal 
cut OS. 

Historically, the key inequality given by Cheeger [7] was a lower bound 
on the first eigenvalue Ay of the Laplacian on the domain 2. Grieser [16] 
observed how neatly and directly this bound follows from Green’s formula, 
when F = 1 and |v| < 1. We expect to see the Schwarz inequality in the step 
from problems in Lt and L® to the eigenvalue problem in L?: 


ff w= [civoye = — ff o(graaw’) 
<2 | |u| | grad u| sal ffw f Jeane] 


Thus any feasible t gives a lower bound t?/4 to the Rayleigh quotient for any 
u(a,y) with u = 0 on 02: 


2 2 
? < Jf | grad ul dx dy (1.18) 
4 [fu dx dy 
The minimum of the right side is \;({2), and the maximum of the left side is 
h?/4. Cheeger’s inequality becomes h?/4 < \1(). 

A widely studied paper [10] of Diaconis and Stroock introduces another 
very useful measure of the “bottleneck” that limits flow on a graph. 


1.4 The Challenge Problems 


When we use the @? norm of the flow vector v = (v1, v2) at each point, the 
constraint |u(a,y)| < c(a,y) is isotropic. Other norms of v give constraints 
that come closer to those on a discrete graph. The edges of the graph might 
be horizontal and vertical (from a square grid) or at 45° and —45° (from a 
staggered grid). We use the challenge problem with F = c = 1 on a unit 
square as an example that allows computation of the minimal cut in all three 
cases. 


constraint: ve +3 <1. (1.19) 
é' constraint: jor| + |ve| < 1. (1.20) 
é° constraint: max(|v1|,|v2|) < 1. (1.21) 
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The ¢' and £©° norms give problems in linear programming. 

The dual (minimum cut) problems use the dual norms. For £? we had 
the usual BV norm |f | grad u| dx dy and the usual measure |0S| = f[ ds of 
the perimeter. For the ¢' and °° constraints, the BV norms change and the 
perimeters reflect those changes (coming from the coarea formula in the new 
norms): 


juli <1 leads to 


lulv = ff [gradu day Sud |2S|a0 = f masa) aul), 
Os 


luloo <1 leads to 


lulav = ff |eradu day and |0S|, = f acl + | 
as 


The perimeter of a square changes as the square is rotated, because the norm 
of (dx, dy) is changing. In each case the dual problem looks for the minimum 
cut as the solution to a constrained isoperimetric problem. 


Doaliet Minimize ee 


|Sloo 9g LOS 
Pee: “sco S| 


and —~— : 
[S| |S| 


(1.22) 


In all cases the optimal S$ will reach the boundary 02 of the square. (If S' is 
stretched by a factor c, the areas in the denominators of (1.22) are multiplied 
by c? and the numerators by c.) The symmetry of the problem ensures that 
the optimal 0S contains four flat pieces of 022, centered on the sides of the 
square (Figure 1.1). The only parameter in the three optimization problems 
is the length L of those boundary pieces. 

Figure 1.la shows the solution for the @? problem, where the “uncon- 
strained” parts of the cut OS are circular arcs. This follows from the classical 
isoperimetric problem, and it is easy to show that the arcs must be tangent 
to the square. The four arcs would fit together in a circle of radius r. With 
L=1- 2r, the optimal cut solves the Cheeger problem: 


. perimeter of S . 4(1—2r) + 2nr 
=h(2) = ————_—___—_ = ay ae 1.2 
ties = U2) = Dae area of S eee 1 = 42} er 0.28) 


The derivative of that ratio is zero when 
(1 — 4r? + mr?) (8 — 2m) = (4— 8r + 2ar)(8r — Qar). 


Cancel 8 — 27 to reach 1 — 4r + (4— 7)r? = 0. Then r = 1/(2+ ./m) © .265. 
The Cheeger constant h(2) is the ratio |OS|/|S| = 1/r = 2+ /7. 

A prize of 10,000 yen was offered in [30] for the flow field that achieves 
div v = 2+ /7 with |v| < 1. Lippert [20] and Overton [24] have the strongest 
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1-2R S L=1 


|OS| = 4L + 2nr |OS|oo =4L+4R |AS|1 = 4 


Fig. 1.1 The minimum cuts 0S for (7, ¢', and €° constraints on v(2,y). 


claim on the prize, by computing a close approximation to v. The discrete 
velocity clearly confirms the cut in Figure 1.1la as the set where |v| = 1. 

The eigenfunctions of the Laplacian on the unit square are (sin 7x)(sin ty) 
and the lowest eigenvalue is 4; = 277. Cheeger’s inequality \; > h?/4, which 
other authors have tested earlier, is far from tight. 


Unit square: 2? >(2+/m)?/4 or 19.74 > 3.56. (1.24) 


The second challenge problem has |v;| + |vg| < 1 leading to the measure 
|OS|x. of the perimeter in the dual. Now the unconstrained isoperimetric 
problem is solved by a diamond with |n| = |ng| = 1/V2 on all edges. The 
optimal cut OS in Figure 1.1b is a union of boundary pieces and diamond 
edges. The edge length V2R is multiplied by 1/2 from |n1| = |n2| to give 
|OS|.0 =4L+4R = 4—4R. Then the minimum cut has R = (2—/2)/2 = .3: 


min A Slo0 = min aoa v2 
|S] Rk 1—2R? 


= eo 


For the flow field v in this £' problem, the prize is reduced to 5000 yen. Lippert 
has reached the computational prize also in ¢'. This is linear programming 
and interior point methods soundly defeated the simplex method. 

We cannot afford the prize in the @° problem, whose solution is simply 
v = (2a — 1,2y — 1) on the square 0 < x,y < 1 with divu = 4 = tmax. 

The minimum cut for the @° problem is the whole boundary of the 
square. This coincides with the unconstrained isoperimetric solution when 
the perimeter is measured by f |da| + |dy|. The minimizing set S' would have 
horizontal and vertical sides wherever the constraint S C (2 is inactive, and 
here it is active everywhere on 05 = 092. The Cheeger constant in this norm 
ish =4/1. 

In [32] we prove that the unit ball in the dual norm (rotated by 7/2) 
is isoperimetrically optimal. Here that ball is a circle or a diamond or a 
square. This isoperimetrix was discovered by Busemann [5] using the Brunn— 
Minkowski theorem in convex geometry (the Greeks knew much earlier about 
the circle). Our proof finds a simple linear equation for the support function 
of the optimal convex set S. 


3.5. 
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Added in proof Z. Milbers (unpublished thesis, Kdln Universitat, 2006) has found the 
flow field in our challenge example (described at the end of Section 1.1)! New applications 
of minimum cuts and maximum flow have also appeared in landslide modeling, the L* 
Laplacian, and especially image segmentation. 


References 
1. N. Alon, Eigenvalues and expanders, Combinatorica 6 (1986) 86-96. 
2. B. Appleton and H. Talbot, Globally minimal surfaces by continuous maximal flows, 
IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 106-118. 
3. Y. Boykov and V. Kolmogorov, An experimental comparison of min-cut/max-flow 
algorithms for energy minimization in vision, IEEE Trans. Pattern Anal. Mach. Intell. 
26 (2004) 1124-1137. 
4. Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph 
cuts, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 1222-1239. 
5. H. Busemann, The isoperimetric problem in the Minkowski plane, Amer. J. Math. 69 
(1947) 863-871. 
6. J. D. Chavez and L. H. Harper, Duality theorems for a continuous analog of Ford- 
Fulkerson flows in networks, Adv. Appl. Math. 14 (1993) 369-388. 
7. J. Cheeger, A lower bound for the smallest eigenvalue of the Laplacian, Problems in 
Analysis, 1970, 195-199. 
8. F. R. K. Chung, Spectral graph theory, CBMS Regional Conference Series in Mathe- 
matics, vol. 92, 1997. 
9. F. R. K. Chung, Laplacians and the Cheeger inequality for directed graphs, Ann. 
Combinatorics 9 (2005) 1-19. 
10. P. Diaconis and D. W. Stroock, Geometric bounds for eigenvalues of Markov chains, 
Ann. Appl. Probab. 1 (1991) 36-61. 
11. L. C. Evans, Survey of applications of PDE methods to Monge-Kantorovich mass trans- 
fer problems, www.math. berkeley .edu/~evans (earlier version: Current Developments 
in Mathematics, 1997). 
12. W. Fleming and R. Rishel, An integral formula for total gradient variation, Archiv der 
Mathematik 11 (1960) 218-222. 
13. L. R. Ford Jr. and D. R. Fulkerson, Flows in Networks, Princeton University Press, 
1962. 
14. L.R. Ford Jr. and D. R. Fulkerson, Maximal flow through a network, Canad. J. Math. 
8 (1956) 399-404. 
15. W. Gangbo and R. McCann, Optimal maps in Monge’s mass transport problem, C.R. 
Acad. Sci. Paris. Ser. I. Math. 325 (1995) 1653-1658. 
16. D. Grieser, The first eigenvalue of the Laplacian, isoperimetric constants, and the max 
flow min cut theorem, Archiv der Mathematik 87 (2006) 75-85. 
17. T. C. Hu, Integer Programming and Network Flows, Addison-Wesley, 1969. 
18. M. Iri, Theory of flows in continua as approximation to flows in networks, Survey of 
Mathematical Programming 2 (1979) 263-278. 
19. R. Kohn and R. Temam, Dual spaces of stresses and strains, Appl. Math. and Opt. 
10 (1983) 1-35. 
20. R. Lippert, Discrete approximations to continuum optimal flow problems, Stud. Appl. 
Math. 117 (2006) 321-333. 
21. J. S. Mitchell, On maximum flows in polyhedral domains, Proc. Fourth Ann. Symp. 


Computational Geometry, 341-351, 1988. 


1 Maximum Flows and Minimum Cuts in the Plane 11 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 
32. 


33. 


R. Nozawa, Examples of max-flow and min-cut problems with duality gaps in contin- 
uous networks, Math. Program. 63 (1994) 213-234. 

R. Nozawa, Max-flow min-cut theorem in an anisotropic network, Osaka J. Math. 27 
(1990) 805-842. 

M. L. Overton, Numerical solution of a model problem from collapse load analysis, 
Computing Methods in Applied Science and Engineering VI, R. Glowinski and J. L. 
Lions, eds., Elsevier, 1984. 

S. T. Rachev and L. Riischendorf, Mass Transportation Problems I, II, Springer (1998). 
S. Roy and I. J. Cox, A maximum-flow formulation of the n-camera stereo correspon- 
dence problem, Proc. Int. Conf. Computer Vision (1988) 492-499. 

F. Santosa, An inverse problem in photolithography, in preparation. 

R. Sedgewick, Algorithms in C, Addison-Wesley, 2002. 

G. Strang, A minimax problem in plasticity theory, Functional Analysis Methods in 
Numerical Analysis, Z. Nashed, ed., Lecture Notes in Mathematics 701, Springer, 1979. 
G. Strang, L1 and L® approximation of vector fields in the plane, Lecture Notes in 
Num. Appl. Anal. 5 (1982) 273-288. 

G. Strang, Maximal flow through a domain, Math. Program. 26 (1983) 123-143. 

G. Strang, Maximum area with Minkowski measures of perimeter, Proc. Royal Society 
of Edinburgh 138 (2008) 189-199. 

C. Villani, Topics in Optimal Transportation, Graduate Studies in Mathematics 58, 
American Mathematical Society, 2003. 


Chapter 2 


Variational Principles and Residual 
Bounds for Nonpotential Equations 


Giles Auchmuty 


Dedicated to Gil Strang for his 70th birthday 


Summary. Solutions of nonsymmetric linear equations whose symmetric 
part is positive definite are first characterized as saddle points of a strictly 
convex—concave quadratic function. The associated primal problem is shown 
to be equivalent to a weighted quadratic minimum residual optimization 
problem. An a posteriori error estimate for approximate solutions is derived. 
Similar results are then obtained for semilinear finite-dimensional systems of 
equations. These include global optimization problems for the solutions and 
existence results based on min-max theorems. Under further assumptions, 
uniqueness theorems are proven using saddle point theorems. 


2.1 Introduction 


In this chapter, some unconstrained global optimization problems for the 
solutions of finite-dimensional systems of equations that are not of potential 
type are described and analyzed. That is, the equations need not be obtained 
directly as the critical points of a differentiable function. 

This approach is first illustrated by considering the case of a linear equation 
involving a matrix that is nonsymmetric but has positive definite symmet- 
ric part. In Section 2.2 the solutions of such equations are shown to be the 
saddle point of certain functions, and the existence and uniqueness of these 
saddle points are proven directly. The dual variational principles associated 
with this saddle point problem are studied. In Section 2.3, the primal prob- 
lem associated with this saddle problem is shown to be a weighted quadratic 
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residual minimization problem. Thus the variational principle provides ele- 
mentary error estimates for the solution in terms of the value of the function 
being minimized. These results extend to nonsymmetric problems some of 
the results that are well known for the solutions of symmetric positive def- 
inite problems. See the recent texts of Ainsworth and Oden [1] and Han [7] 
for applications of such results to the analysis of finite element simulations. 

Sections 2.4 and 2.5 describe extensions of these results to two very differ- 
ent classes of semilinear finite-dimensional equations. In each case, a min-max 
characterization of the solutions is described. These need not be saddle point 
problems but there is a primal functional whose global minima provide so- 
lutions of the original problem. A generalized Young’s inequality and results 
from minimax theory are used to prove existence results and obtain character- 
izations of the solutions. The problem in Section 2.4 may again be interpreted 
as a minimum residual problem and residual bounds for approximate solu- 
tions are described. The problem in Section 2.5 uses similar methods but 
makes quite different assumptions on the nonlinear term to obtain existence 
and, under further assumptions, uniqueness results. 

It is a particular pleasure to contribute this chapter to these conference 
proceedings in honor of Gilbert Strang as essentially all of the background 
material for these results has been beautifully treated in his texts on linear 
algebra and applied mathematics. 


2.2 Saddle Point Characterizations of Nonsymmetric 
Linear Equations 


Let A be an n x n real matrix that satisfies the coercivity (or ellipticity) 
condition that there is an ag > 0 such that 


(Az,r) > ag|la||? for all a2 €R”. (2.1) 


Here the brackets indicate the usual Euclidean inner product and the norm 
is the usual 2-norm. In general a function V : R” — R is said to be coercive 
provided 

|lz|[-* V(x) > 0 as ||a'|| > oo. 


In this section, it is shown that, when A is coercive, the solutions of the 
linear equation 


Ac = f (2.2) 


for given f € R”, may be characterized as saddle points of a quadratic 
convex—concave function. 

When A is real symmetric and coercive, it is well known that the solution 
of (2.2) is the unique minimizer of the energy functional E(a2) on R” defined 
by 


2 Variational Principles and Residual Bounds for Nonpotential Equations 15 


E(x) := (Ag,x) — 2(f,2). (2.3) 


A variety of different aspects of this problem is treated in Strang [8]. 

When A is coercive, but not real symmetric, then Auchmuty [3] showed 
that the solution of (2.2) may be characterized as the saddle point of a 
convex—concave function. Here a different saddle function is used that pro- 
vides minimum residual variational principles and computable error bounds 
for approximate solutions of (2.2). 

Let Ag := (A + A?)/2 and B:=(A — A”)/2 be the symmetric and 
skew symmetric parts of A, D be a diagonal positive definite matrix, and 
C:= Ag — D. Then C is a real symmetric matrix and equation (2.2) may 
be written 

(B+C4 D)« = f. (2.4) 
When C1, C2 are real symmetric matrices, we say that Cp < C, (respectively, 
Cz < C1) provided ((C; — C2)x,x) > 0 for all x € R”, (or ((C) —C2)a, x) > 0 


for all nonzero x € R”). 
Consider the function £: R” x R” — R defined by 


1 
£(x,y) = ((As—D/2)x,2) + (f,y—a) — ((B+C)x,y) — 5(Dy,y). (2:5) 
A point (#,4) € R” x R” is said to be a saddle point of L provided 
Lay) < L(G,9) < L£(@,y) for all (x,y) € R" x R". 


The following result shows that, provided D is small enough, the func- 
tion £ defined above has a unique saddle point, and that this saddle point 
characterizes the solutions of the linear equation (2.2). 


Theorem 2.1. Assume A, B, C, D as above, (2.1) holds, and D < 2Ag. 
Then the function £L defined by (2.5) has a unique saddle point (4,%) € 
R” x R” with & being the unique solution of (2.2). 


Proof. Given y in R”, L(.,y) is continuous. When D < 2As, this function is 
strictly convex and there is a 6 > 0 and a function c such that 


L(x,y) = dljell? — [Il Fll + (C — B)yll] llell + ey). 


Hence L(.,y) is coercive for each y € R”. Similarly —L(a,.) is continuous, 
strictly convex, and coercive for each x € R”. The usual minimax theorem 
(see Theorem 49A in Zeidler [9] for example) implies that £ has a saddle 
point in R” x R”. 

This function £ is continuously differentiable, so the saddle point is a 
solution of the system 


Vi L(x, y) (2QAs — D)x + (B-C)y — f = 0 (2.6 
VyL(z,y) = f — (B+C)x — Dy = 0. 
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Here V is the usual gradient operator. Add these equations to obtain 
(C+D-—B)(x-y) = 0. 
Take inner products with (a — y); then 
(As(t—y),e—y) 2 aollz—yll? = 0 


as Ag = C+D, (2.1) holds, and (Bz, z) = 0 for all z. Thus the saddle point 
must have the form (#,#) and (2.7) shows that the equation satisfied by @ is 
(2.2). The uniqueness of solutions follows from (2.1). 


Note that this is a (quite different) existence-uniqueness proof for (2.2) 
when (2.1) holds. It is based on elementary calculus and the minimax theo- 
rem. The analysis in Auchmuty [3] used a different saddle function that did 
not involve a matrix of the form D. The introduction of this matrix leads 
to expressions that are more practical for numerical computation than those 
described in [3]. Strictly speaking, it is not necessary that D be diagonal; it 
suffices that D be symmetric and positive definite and that the inverse D~! 
be known explicitly for use in formulae to be described in the next section. 


2.3 Variational Principles for Nonsymmetric Linear 
Equations 


A convex—concave saddle problem defines a pair of associated dual variational 

principles. See Auchmuty [2], Section 3 or Ekeland and Temam [5], Chapter 

III for general descriptions of such constructions and their properties. 
Define the function G : R” — R by 


G(x) := sup L(a,y). (2.8) 


The primal problem associated with £ is to minimize G on R”. The maxi- 
mization of £ with respect to y is straightforward and yields 


Ga) = 5((As—D/2)x,2) — (f,2) + 5(D(f-(B+O)a), f-(B+O)a). 
(2.9) 
The essential results about this minimization problem may be summarized 

as follows. 


Theorem 2.2. Assume A, B, C, D as above and (2.1) holds. Then the func- 
tion G defined by (2.9) is strictly convex and coercive on R”. It has a unique 
minimizer & on R” with & being the unique solution of (2.2). 


Proof. First note that D~' is diagonal with positive entries on the diagonal, 
as this holds for D. Straightforward algebra leads to the formula 
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G(x) = — (D-1(f — Az), f — Az) (2.10) 


NlR w]e 


1 
(D-1Az, Aa) — (ATD™*f,2) + S(DF,f). (2.11) 
From (2.1) and Cauchy’s inequality one has ||Aa|| > ao||a||. Let day be the 
largest entry in the diagonal matrix D; then 


ao 


2dm 


G(x) > |x|? — er|lfll lel] + c2 for all 2 € R” (2.12) 


where cj, C2 are constants. These formulae imply that G is strictly convex and 
coercive on R” as it is quadratic. It is continuous so G attains its infimum 
and the minimizer is unique. The expression (2.11) is G-differentiable with 


VG(z) = ATD-1(Ax — f). 


This must be zero at a minimizer. Because A and D are nonsingular, this 
implies that the minimizers satisfy (2.2) as claimed. 


This function G was defined from general considerations of duality asso- 
ciated with the function £. Let r(#) := f — Ax be the residual of « with 
respect to the equation (2.2); then (2.11) implies that 


G(r) = (D-'r,r) > — \|r(a) ||? for all « € R”. (2.13) 
2 2du 

This shows that the variational principle of minimizing G on R” is a 
weighted (or preconditioned) minimum residual principle for the problem 
of solving (2.2). This enables us to obtain an a posteriori error estimate for 
solutions of (2.2) in terms of the values of G(a). When @ is the solution of 
(2.2) and « € R”, then A(x — #) = r, so upon taking inner products with 

x — & and using (2.1) one sees that 


ag ||e —4||? < |(r,e—-4)| < [rll le - 4 
Rearrange this; then (2.13) implies 


Ja —2|| < aot [Ir] < ag7*V/2dur G(x) for all x € R”. (2.14) 


This estimate does not require knowledge of a condition number for A; just a 
bound on ag. Note that D here is any positive definite diagonal matrix, but 
the values of G(x) depend on both D, D~+. This suggests that in practice 
one may wish to investigate the dependence of these bounds on the choice of 
D for a particular matrix A in (2.2). 

The dual problem associated with the saddle point problem for L is to 
maximize H : R” — R defined by 


Hy) := inf, cpr L(2,y). (2.15) 
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The explicit formula for H analogous to (2.9) is 


Hy) = —5((20-+D) (f+ (C—B)y), F+(C— By) + (Fu) — 5(Dysy): 

(2.16) 
The use of this function requires the evaluation of (2C + D)~! which usually 
is more effort than the determination of D~!, so we just concentrate on use 
of the primal problem. This dual functional has very similar properties to 


—G. 


2.4 Variational Principles for Semilinear Equations I 


The preceding analysis generalizes to large classes of semilinear finite-dimen- 
sional systems of equations. First consider an equation of the form 


Az = F(x), (2.17) 


where A is a real n X n matrix which is coercive but need not be symmetric, 
and F' : R” — R” is a continuous function. Let B, C, and D be matrices 
defined as before, so that this equation may be written as 


F(x) — (B+C)x = Dx = Vaq(z) with g(a) := 


Consider the function G : R” — R defined by 


6Gs). i= 5 (Da) + 5(D-\(F(a) ~ (B + C)a), F(z) ~ (B+.C)) 
+(Cx—F(x),x) (2.19) 


and the variational problem of minimizing G on R”. Let the value of this 
problem be 
a(G) := inf G(x). 
(G) i (x) 
This is a generalization of the variational principle described in Section 2.3 
with F(x) replacing f. The essential properties of this optimization problem 
may be summarized as follows. 


Theorem 2.3. Assume A, B, C, D, and F as above. Then the function G 
defined by (2.19) is continuous and has value a(G) > 0. A point & € R” is 
a solution of (2.17) if and only if & minimizes G on R” and G(z) = 0. 


Proof. Let q(x) be the quadratic form defined in (2.18); then q is strictly 
convex on R” and its conjugate function is g*(z) := $(D~1'z,z). Then 


from the generalized Young inequality, (see Proposition 51.2 in Zeidler [9] or 
Section 2.5 in Han [7]), one sees that 
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q(x) + (F(x) —-(B+C)x) — (F(x)-(B+C)az,x) > 0 for all € R”. 

(2.20) 
This and the fact that B is skew-symmetric implies that G(x) > 0 for all 
x € IR”. Hence a(G) > 0. Equality holds in (2.20) if and only if 


F(x)-(B+C)rz = Dz. 


This is equation (2.17) and at such a point G(x) = 0. 


This result provides a variational principle for the possible solutions of our 
equation that requires not only that the points be global minimizers but also 
that the value of the problem be zero. It is a straightforward computation to 
verify that the analogue of equation (2.13) remains valid for this nonlinear 
problem. Specifically let r(av) := Ax — F(x) be the residual of this equation 
at a point 2 € R”, then 


DAG. S sae rl? for alla ER”. (2.21) 
M 


Thus the function G is again a weighted residual function for this problem 

and G(a) small implies that the residual is small. Without further conditions 

on the nonlinearity F there is no guarantee that the residual being small 

implies that a point x is close to a solution of the orginal equation. To obtain 

such a condition on the function F' we use an associated min-max problem. 
Consider the function M :R” x R” — R defined by 


1 
M(z,y) := ((As—D/2)x,2) + (F(2),y—2) — ((B+C)a,y) — 5(Dy,y). 
(2.22) 
A point (#,9) € R” x R” is said to be a min-max point of M provided 
M(é,g) = inf sup M(z,y). (2.23) 
eR” yelR” 


Obviously a saddle point of M is a min-max point but the converse need not 
hold; see Auchmuty [4], Section 2 for descriptions of this. First note that this 
expression for M implies that 


G(z) = sup M(2,y), (2.24) 
yelR” 


so (#,Y) is a min-max point of M implies that @ is a minimizer of G. 
Here, and in the next section, we need a general result about the existence 
of such min-max points. The theorem that is used may be stated as follows. 


Theorem 2.4. Let K be a nonempty closed convex set in R” and assume 
that M: K x K > R satisfies 


(i) M(a,2) < 0 for alla in Kk. 
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(ii) For each x in K, M(a,.) is concave on K. 
(iii) For each y in K, M(.,y) is Ls.c. on K. 
(iv) For some yo in K, the set Eg := {x € K: M(x,yo) < 0} is bounded 


in Kk. 
Then there is an xo € K satisfying supyex M(x0,y) < 0. 


This result is a specialization of Theorem 3 in Auchmuty [4] and is esssen- 
tially due to Ky Fan [6]. 


Theorem 2.5. Suppose A, B, D, and F as above and the set of points that 
satisfy 
((Ag — D/2)z,x) — (F(ax),x) < 0 (2.25) 


is bounded in R". Then M has a min-max point (, 9) with G(z) = 0 and & 
is a solution of (2.17). 


Proof. This is proved using Fan’s theorem with kK = R” and M defined by 
(2.22). Condition (i) holds with M(az,2) =0 as Ag — D/2 = C+ D/2 and 
B is skew-symmetric; (ii) holds as D is positive definite; and (iii) holds as 
M is continuous. Take yo = 0 in condition (iv); then the criterion for Eo 
to be bounded is that the set of points for which (2.25) holds is bounded. 
Thus Theorem 2.4 yields that there is an & € R” for which G(#) < 0. From 
Theorem 2.3, G(x) > 0 for all x, so there is a minimizer of G with G(#) = 0 
and it is a solution of (2.17). 


Suppose the nonlinear mapping F’ satisfies the following condition. 


Condition C1: There is an R>0 and an ay, such that a, < ap and 
(F(x),2) < ay||z||? for all ||a|| > R. (2.26) 


This condition holds both in the linear case where F(x) = f is constant 
or when F is nonlinear and bounded on R”. 


Corollary 2.6. Assume A satisfies (2.1), F is continuous and C1 holds. 
Then there is at least one solution & of (2.17) and & minimizes G on R”. 


Proof. When (2.26) holds then the set of points for which (2.25) holds is 
bounded provided D is chosen to be sufficiently small. Thus Theorem 2.5 
yields this result. 


2.5 Variational Principles for Semilinear Equations IT 


There is another class of semilinear, but not necessarily potential, equations 
for which there are useful variational principles whose minima provide solu- 
tions of the equations. 
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When a function V is continuous and coercive on R”, then it is bounded 
below on R” and its conjugate function V* : R” — R defined by 


V*(z) := sup [(z,2) — V(a)| 
ze R” 


is convex, lower semicontinuous, and finite for all z € R”. 
Consider the problem of solving equations of the form 


Az + VV(ax) =0, (2.27) 


where A is areal n x n matrix and V is assumed to be C! and coercive on R”. 
This may be regarded as a special case of the system treated in the previous 
section with F(x) = —VV(a) but now A need not be coercive. 

Define 7 : R” > R by 


J(z) := V(x) + V*(—Az) + (Aga, 2). (2.28) 


When V, A are as above then this function is finite for each x € R” and 
lower semicontinuous (Ls.c.). Consider the problem of minimizing 7 on R” 
and finding 
a(J) := inf J(a). 
cel” 
This problem has similar properties to that of the problem described in 
Section 2.4. 


Theorem 2.7. Assume V is Ct and coercive on R” and A is a realn x n 
matrix. Then the function J defined by (2.28) is lower semicontinuous and 
finite at each x and a(J) > 0. A point & © R” is a solution of (2.27) if 
and only if it minimizes J on R” and J(&) = 0. 


Proof. The function V* is finite-valued, convex, and lower semicontinuous 
from our assumptions on V and because it is the supremum of a family of 
such functions. Thus 7 is finite and Ls.c. on R”. The generalized Young’s 
inequality implies that 7(#) > 0 for all x and that 7(x) = 0 if and only if x 
satisfies 

—Az € OV(z). 


Here 0V (a) is the subdifferential of V at 2. Because V is G-differentiable, 
OV(a) = {VV(x)}, so the result follows. 


Here again, 7 provides a variational principle for the solutions of (2.27 
for which the minimizers of 7 must satisfy an extra condition to actually 
be a solution of the equation. This result did not require that V be convex. 
When V is also convex, then some existence results for this problem can be 
obtained using min-max methods. Consider the function W : R” x R” = R 
defined by 


W(a,y) = V(x) + (Aga,x) — (Az,y) — Vy). (2.29) 
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This function had the property that 


I(x) = sup W(2,y). (2.30) 
yelR” 


The following theorem describes conditions on the function V that guar- 
antee existence of a minimizer of 7 that obeys the criteria of Theorem 2.7. 


Theorem 2.8. Suppose V is convex, C', and coercive on R", and the set of 
points satisfying 
V(x) + (Agz,x) < V(0) (2.31) 


is bounded in R". Then W has a min-max point (4,9) and J(&) = 0. If, 
in addition, A satisfies (2.1), then W is convex—concave, (%,Y) is a saddle 
point of W, & is the unique minimizer of J, and there is a unique solution 
of (2.27). 


Proof. To prove the first part of this, the conditions of Theorem 2.4 are 
verified for W. Our assumptions on V imply that (i), (ii), and (iii) all hold. 
Take yo = 0 in (iv); then the criteria above guarantee that the set of points 
which satisfy (2.31) is bounded. Thus Theorem 2.4 says that there is an ¢ 
such that 7(%) < 0. From Theorem 2.7, 7(x) > 0 for all z, so 7(#) = 0. For 
a given #, the supremum of W(%,.) is attained as V is convex and coercive, 
hence there is a min-max point. When A satisfies (2.1), W(.,y) is strictly 
convex on R” for each y and the remaining results follow from the usual 
saddle point theorem. 


The condition (2.31) says that provided 


liminf [V(x) + (Aga,2)] > V(0) (2.32) 


I|z|| +00 


then the equation (2.27) has at least one solution, without requiring coercivity 
of A. In this case, coercivity of A yields uniqueness of the solution and dual 
variational principles for the problem may be described. 

The results described here can be generalized to variational principles and 
solvability results for equations between a real reflexive Banach space X and 
its dual space X* using similar constructions and proofs. The relevant saddle 
point theorems in [9], or the min-max theorems of [4] hold in this generality. 
When natural conditions are imposed on A, F’ so that these theorems hold, 
results analogous to those described here for the finite-dimensional case may 
be stated. 
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Chapter 3 


Adaptive Finite Element Solution of 
Variational Inequalities with 
Application in Contact Problems 


Viorel Bostan and Weimin Han 


Summary. In this chapter, we perform a posteriori error analysis for the 
adaptive finite element solution of several variational inequalities, including 
elliptic variational inequalities of the second kind and corresponding qua- 
sistatic variational inequalities. A general framework for a posteriori error es- 
timation is established by using duality theory in convex analysis. We then de- 
rive a posteriori error estimates of residual type and of recovery type, through 
particular choices of the dual variable present in the general framework. The 
error estimates are guaranteed to be reliable. Efficiency of the error estimators 
is theoretically investigated and numerically validated. Detailed derivation 
and analysis of the error estimates are given for a model elliptic variational in- 
equality. Extensions of the results can be made straightforward in solving oth- 
er elliptic variational inequalities of the second kind, and we present such an 
extension for a problem arising in frictional contact. Moreover, we use a qua- 
sistatic contact problem as an example to illustrate how to extend the a pos- 
teriori error analysis in solving time-dependent variational inequalities. Nu- 
merous numerical examples are included to illustrate the effectiveness of the a 
posteriori error estimates in adaptive solutions of the variational inequalities. 


Key words: A posteriori error estimation, adaptive finite element solution, 


elliptic variational inequality, quasistatic variational inequality, frictional con- 
tact, duality, reliability, efficiency 


3.1 Introduction 


In this chapter, we present some theoretical and numerical results on a 
posteriori error estimation and adaptive finite element solution of elliptic 
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variational inequalities of the second kind, as well as corresponding qua- 
sistatic variational inequalities, especially those arising in frictional contact 
problems. 

The general framework for a posteriori error estimation of the chapter 
works for any Galerkin solutions of the variational inequalities. However, 
we specifically choose the finite element method for approximation, as the 
method today is the dominant numerical method for solving most problems 
in structural and fluid mechanics. It is widely applied to both linear and 
nonlinear problems. General mathematical theory of finite element methods 
can be found in [4, 26, 27, 58, 64], among others. The textbook [51] offers 
an easily accessible mathematical introduction of finite element methods, 
whereas the two recent textbooks [17, 18] provide deeper mathematical theory 
together with more recent and current research development such as the 
multigrid methods. Traditionally, convergence of finite element solutions is 
achieved through mesh refinement with the use of a piecewise low degree 
polynomial. Because h is usually used to denote the mesh size, the traditional 
finite element method is also termed the h-version finite element method. 

On the other hand, convergence of the method can also be achieved by 
using piecewise increasingly higher degree polynomials over relatively coarse 
finite element meshes, leading to the p-version finite element method. De- 
tailed discussion of the p-version finite element method can be found in [66]. 
The p-version method is more efficient in areas where the solution is smooth, 
so it is natural to combine the ideas of the p-version and the h-version to 
make the finite element method very efficient on many problems. A well- 
known result regarding the h-p-version finite element method is the expo- 
nential convergence rate for solving elliptic boundary value problems with 
corner singularities, under proper combinations of local polynomial degrees 
and element sizes. 

Comprehensive mathematical theory of the p-version and h-p-version fi- 
nite element methods with applications in solid and fluid mechanics can be 
found in [63]. Mixed and hybrid finite element methods are often used in solv- 
ing boundary value problems with constraints and higher-order differential 
equations. Mathematical theory of these methods can be found in [19, 62). 
Several monographs are available on the numerical solution of Navier-Stokes 
equations by the finite element method (see, e.g., [33]). Theory of the finite 
element method for solving parabolic problems can be found in [67] and more 
recently in [68]. Finally, we list a few representative engineering books on the 
finite element method, [11, 50, 76, 77]. The reader is referred to two historical 
notes [57, 75] on the development of the finite element method. 

For practical use of a numerical method, one important issue is the assess- 
ment of the reliability and accuracy of the numerical solution. The reliability 
of the numerical solution hinges on our ability to estimate errors after the so- 
lution is computed; such an error analysis is called a posteriori error analysis. 
A posteriori error estimates provide quantitative information on the accuracy 
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of the solution and are the basis for the development of automatic, adaptive 
solution procedures. 

The research on a posteriori error estimation and adaptive mesh refine- 
ment for the finite element method began in the late 1970s. The pioneering 
work on the topic was done in [5, 6]. Since then, a posteriori error analysis 
and adaptive computation in the finite element method have attracted many 
researchers, and a variety of different a posteriori error estimates have been 
proposed and analyzed. In a typical a posteriori error analysis, after a finite 
element solution is computed, the solution is used to compute element error 
indicators and an error estimator. The element error indicator represents the 
contribution of the element to the error in the computation of some quantity 
by the finite element solution, and is used to indicate if the element needs to 
be refined in the next adaptive step. The error estimator provides an estimate 
of the error in the computation of the quantity of the finite element solution, 
and thus can be used as a stopping criterion for the adaptive procedure. Of- 
ten, the error estimator is computed as an aggregation of the element error 
indicators, and one usually only speaks of error estimators. Most error esti- 
mators can be classified into residual type, where various residual quantities 
(residual of the equation, residual from derivative discontinuity, residual of 
material constitutive laws, etc.) are used, and recovery type, where a recovery 
operator is applied to the (discontinuous) gradient of the finite element solu- 
tion and the difference of the two is used to assess the error. Error estimators 
have also been derived based on the use of hierarchic bases or equilibrated 
residual. Two desirable properties of an a posteriori error estimator are relia- 
bility and efficiency. Reliability requires the actual error to be bounded by a 
constant multiple of the error estimator, up to perhaps a higher-order term, 
so that the error estimator provides a reliable error bound. Efficiency requires 
the error estimator to be bounded by a constant multiple of the actual error, 
again perhaps up to a higher-order term, so that the actual error is not over- 
estimated by the error estimator. The study and applications of a posteriori 
error analysis is a current active research area, and the related publications 
grow fast. Some comprehensive summary accounts can be found, in chronicle 
order, in [70, 1, 7]. 

Initially, a posteriori error estimates were mainly developed for estimating 
the finite element error in the energy norm. In the recent years, error estima- 
tors have also been developed for goal-oriented adaptivity. The goal-oriented 
error estimators are derived to specifically estimate errors in quantities of 
interest, other than the energy norm errors. Chapter 8 of [1] is devoted to 
such error estimators. The latest development in this direction is depicted in 
[10, 32]. 

Most of the work so far on a posteriori error analysis has been devoted to 
ordinary boundary value problems of partial differential equations. In appli- 
cations, an important family of nonlinear boundary value and initial bound- 
ary value problems is that associated with variational inequalities, that is, 
problems involving either differential inequalities or inequality boundary con- 
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ditions. Mechanics is a rich source of variational inequalities (see, e.g., [59]), 
and some examples of problems that give rise to variational inequalities are 
obstacle problems, contact problems, plasticity and viscoplasticity problems, 
Stefan problems, unilateral problems of plates and shells, and non-Newtonian 
flows involving Bingham fluids. An early comprehensive reference on the topic 
is [29], where many nonlinear boundary value problems in mechanics and 
physics are formulated and studied in the framework of variational inequal- 
ities. A concise introduction to the mathematical theory of some variational 
inequalities can be found in [54]. Numerical approximations of general vari- 
ational inequalities are studied in detail in [34, 35]. Numerical methods for 
some variational inequalities arising in mechanics are the subject of [47, 48]. 
Mathematical analysis and numerical approximations of variational inequal- 
ities arising in contact mechanics are presented in [53] (for elastic materials) 
and [46] (for viscoelastic and viscoplastic materials). In [43, 44], elastoplas- 
ticity problems are formulated and analyzed in the form of variational in- 
equalities. 

Although several standard techniques have been developed to derive and 
analyze a posteriori error estimates for finite element solutions to problems in 
the form of variational equations, they do not work directly for a posteriori 
error analysis of numerical solutions to variational inequalities due to the 
inequality feature of the problems. Nevertheless, numerous papers can be 
found on a posteriori error estimation of finite element solutions of obstacle 
problems, for example, [2, 25, 49, 55, 56, 69] (these papers consider numerical 
solutions on convex subsets of finite element spaces), as well as [31, 52] (these 
papers use a penalty approach for discrete solutions). Obstacle problems are 
so-called variational inequalities of the first kind; that is, they are inequalities 
involving smooth functionals and are posed over convex subsets. We also note 
that a posteriori error estimation is discussed in [12, 13, 65], although the 
derivations of the estimates in these papers are arguable. 

In the context of elastoplasticity with hardening, computable a posteriori 
error estimates are derived in [3, 20, 22] for the primal problem, which is 
a variational inequality of the second kind; that is, the inequality arises as 
a result of the presence of a nondifferentiable functional. These works deal 
extensively also with a priori estimates, and in the latter work a number 
of numerical examples are presented. Residual type error estimators were 
studied for an elliptic variational inequality of the second kind in [15, 16]. 

In this chapter, we derive and study some a posteriori error estimates for 
finite element solutions of elliptic variational inequalities of the second kind 
and corresponding quasistatic variational inequalities. The basic mathemat- 
ical tool we use is the duality theory in convex analysis (cf. [30, 73]). The 
duality theory has been applied to derive efficient a posteriori error estimates 
for mathematical idealizations of physical and engineering problems (see, e.g., 
[37, 38]), as well as for some numerical procedures for solving nonlinear prob- 
lems, such as the regularization techniques in [86, 42, 45], and the Kacanov 
iteration method in [40, 41]. A summary account of these can be found in 
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[39]. In [61, 60], the technique of the duality theory was used to derive a pos- 
teriori error estimates of the finite element method in solving boundary value 
problems of some nonlinear equations. In these papers, the error bounds are 
shown to converge to zero in the limit; however, no efficiency analysis of the 
estimates is given. 

For convenience, we recall here a representative result on the duality theory 
(see [30]). Let V, Q be two normed spaces, and denote by V*, Q* their dual 
spaces. Assume there exists a linear continuous operator A € L(V,Q), with 
transpose A* € £L(Q*,V*). Let F be a functional mapping V x Q into the 
extended real line R = R U {+00}. Consider the minimization problem: 


inf F(v, Av). (3.1) 
Define its dual problem by 
sup [—F"(A"q",—4")], (3.2) 
qg* €Q* 


where F™ is the conjugate functional of F: 


F*(v*,q°) = sup [(v, v*) + (¢,q") — F(v,q@)]- (3.3) 
VE 
qeQ 
Then we have the following theorem. 


Theorem 3.1. Assume 
1) V is a reflexive Banach space and Q a normed space, 


(1) 
(2) F:V x QR is a proper, lower semicontinuous, convex function, 
(3) A: V > Q is a linear bounded operator with its adjoint A* : Q* > V%, 
(4) 

(5) 


4) duo € V with F(uo, Aug) < co and q+ F(uo,q) continuous at Aug, 
5) F(v, Av) — +00 as ||v|| 00 Vue V. 


Then the problem (3.1) has a solution u € V, its dual (3.2) has a solution 
p* € Q*, and 
F(u, Au) = —F*(A*p*, —p*). 


Furthermore, if F is strictly convex, then a solution u of problem (3.1) is 
unique. 


The rest of the chapter is organized as follows. In Section 3.2 we intro- 
duce a model elliptic variational inequality of the second kind and its finite 
element approximation. We provide detailed derivation and analysis of a pos- 
teriori error estimates of the finite element solutions for the model problem. 
In Section 3.3 we formulate a dual problem for the model, and use the dual 
problem to establish a general a posteriori error estimate for any approxima- 
tion of the solution of the model elliptic variational inequality. The general a 
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posteriori error estimate features the presence of a dual variable. Different a 
posteriori error estimates can be obtained with different choices of the dual 
variable. In Section 3.4, we make a particular choice of the dual variable that 
leads to a residual-based error estimate of the finite element solution of the 
model elliptic variational inequality, and explore the efficiency of the error 
estimate. In Section 3.5, we make another choice of the dual variable and 
obtain a recovery-based error estimate of the finite element solution of the 
model elliptic variational inequality. We also study the efficiency of the error 
estimator. In Section 3.6, we present some numerical results to illustrate the 
effectiveness of the estimates in adaptive solution of the elliptic variational 
inequalities. In Section 3.7, we extend the discussion to solving a steady-state 
frictional contact problem. 

Then we turn to an extension of the discussion in adaptively solving time- 
dependent variational inequalities, taking a model quasistatic variational in- 
equality as an example. We begin with an abstract quasistatic variational 
inequality, introduced in Section 3.8, which contains as special cases several 
application problems in contact mechanics and hardening plasticity. A back- 
ward Euler discretization is used to approximate the time derivative in the 
quasistatic variational inequality, leading to a sequence of semidiscretized el- 
liptic variational inequalities. An error estimate of the semidiscrete solution 
is derived. We then focus on a model quasistatic contact problem and derive 
a posteriori error estimates for finite element solutions of its semidiscretized 
approximations in Section 3.9, providing both residual type and recovery 
type error estimates. Finally, the numerical result showing the effectiveness 
of the error estimates in the adaptive solution of the model quasistatic contact 
problem is reported in Section 3.10. 

We now list some notations used repeatedly in the chapter. Let Q be a 
bounded domain in R¢, d > 1, with Lipschitz boundary I = 0. For any 
open subset w of 2 with Lipschitz boundary dw, we denote by H™(w), L?(w), 
and L?(@w) the usual Sobolev and Lebesgue spaces with the standard norms 
I= Moses 2= Wh Uzz™(a)» Uh llows 2= I+ Iz2Qe)s and [f+ llosau = Il Ilz2@u)- Also, 
we make use of the standard seminorm |-|m,. on H™(w). Throughout this 
chapter we use the same notation v to denote both v € H!(Q) and its trace 
yu € L?(I) on the boundary. We reserve the symbol y for the element sides. 


3.2 Model Elliptic Variational Inequality and Its Finite 
Element Approximation 


In this section, we introduce a model elliptic variational inequality of the 
second kind. We comment that the ideas and techniques presented for a pos- 
teriori error analysis in solving the model problem can be extended to other 
elliptic variational inequalities of the second kind; in particular, in Section 
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3.7 we provide a posteriori error analysis for the finite element solution of a 
steady-state frictional contact problem. 

Let Q be a domain in R¢, d > 1, with a Lipschitz boundary I’. Let [, c I 
be a relatively closed subset of I, and denote I, = I'\I; the remaining part 
of the boundary. We allow the extreme situation with I = @ (i.e, [5 =I) 
or I) = T (ie., [2 = @). Because the boundary I is Lipschitz continuous, 
the unit outward normal vector v exists a.e. on I’. We use 0/Ov to denote 
the outward normal differentiation operator, that exists a.e. on I’. Assume 
f € L?() and g > 0 are given. Over the space 


V = Hz, (2) = {v € H' (2): v=0ae. on TY}, (3.4) 
we define a bilinear form and two functionals: 


a(u,v) = f (Vu Vo+ wv) de, 
Q 


IQ= [ feae. 


i) =f alolds. 


In the space V, we use the H!(2)-norm. The model problem is the following 
elliptic variational inequality of the second kind 


ue, a(u,v— wu) + 7(v) — j(u) > (vu — u) VueV. (3.5) 


This model is a so-called simplified friction problem following [34], as it can 
be viewed as a simplified version of a frictional contact problem in linearized 
elasticity (cf. Section 3.7). 

The bilinear form a(-,-) is continuous and V-elliptic, the linear functional 
(-) is continuous, and the functional j(-) is proper, convex, and continuous, 
and therefore, by a standard existence and uniqueness result for elliptic vari- 
ational inequalities of the second kind (see [34, 35]), the variational inequality 
(3.5) has a unique solution. Moreover, due to the symmetry of the bilinear 
form a(-,-), the variational inequality (3.5) is equivalent to the minimization 
problem: find u € V such that 


J(u) = inf J(v), (3.6) 


where J is the energy functional: 
al 
J(v) = zalr,) + j(v) — &(v). (3.7) 
The minimization problem (3.6) also has a unique solution. 


In the analysis of a posteriori error estimators later, we need the following 
characterization of the solution wu of (3.5): there exists a unique \ € L*° (I) 
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such that 


a(u,v) + f grAvds = lv) Yue, (3.8) 
I2 
JA} <1, Aw= ul a.e. on Ip. (3.9) 


The function can be viewed as a Lagrange multiplier. A proof of this char- 
acterization in the case I> = I’ can be found in [34]. The argument there can 
be extended straightforwardly to the more general situation considered here; 
see also the proof of Theorem 3.3. 

It follows from (3.8) that the solution wu of (3.5) is the weak solution of the 
boundary value problem 


—Au+u=f inQ, 


u=0 onl, 


ae + gr =0 on I». 
OV 

We now turn to finite element approximations of the model problem. For 
simplicity, we suppose that 2 has a polyhedral boundary I. In order to define 
the finite element method for (3.5) we introduce a family of finite element 
spaces V; C V, which consist of continuous piecewise polynomials of certain 
degree, corresponding to partitions P, of 2 into triangular or tetrahedral 
elements (other kinds of elements, such as quadrilateral elements, or hexahe- 
dral or pentahedral elements, can be considered as well). The partitions Pp, 
are compatible with the decomposition of I into I, and I. In other words, 
if an element side lies on the boundary, then it belongs to one of the sets I; 
or I. For every element K € Pp, let hx be the diameter of K and px be 
the diameter of the largest ball inscribed in K. For a side y of the element 
kk, we denote by h, the diameter of y. We assume that the family of parti- 
tions P,, h > 0, satisfies the shape regularity assumption; that is, the ratio 
hx /px is uniformly bounded over the whole family by a constant C. Note 
that the shape regularity assumption does not require that the elements be 
of comparable size and thus locally refined meshes are allowed. We use Ep, 
for the set of the element sides; €,,r, En,r,, and En, for the subsets of the 
element sides lying on I’, I, and I, respectively; and Eno = En\En,r for 
the subset of the element sides that do not lie on I’. Let Nj, be the set of all 
nodes in P;, and Nino C NM, the set of free nodes; that is, those nodes that 
do not lie on I. For a given element K € P;,, N(i) and E(K) denote the 
sets of the nodes of K and sides of K,, respectively. 

The patch K associated with any element K from a partition Pp, consists 
of all elements sharing at least one vertex with K; that is, kK =){K’ € Py: 
K'NK # @}. Similarly, for any side y € Ep, the patch 7 consists of the 
elements sharing y as a common side. Note that in the case where the side 7 
lies on the boundary I’, the patch ¥ consists of only one element. For a given 
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element K € Py, vx denotes the unit outward normal vector to the sides 
of kK. When a side ¥ lies on the boundary I’, v denotes the unit outward 
normal vector to I’. For a side y in the interior, v, is taken to be one of 
the two unit normal vectors. In what follows, for any piecewise continuous 
function y and any interior side y € €),0, [y]y denotes the jump of across 
y in the direction v,; that is, 


[e]y(e) = lim (yp(@+tvy)—e(@—tyy)) Veer. 

In the derivation of a posteriori error estimates we use the so-called 
weighted Clément-type interpolation operator. There are several variants 
(see, e.g., [8, 21, 23, 24, 71]) of the interpolation operator introduced by 
Clément [28]. The main difference among these interpolants lies in the way 
the interpolation is performed near the boundary. In this chapter we follow 
the approach used in [23]. 

Corresponding to the partition P;,, we denote MN, C N;, to be the set of 
the element vertices, Ny,r, C Ny the subset of the element vertices lying on 
Ty, and Ny.o = Ny Np, the subset of the interior vertices. Given a € Ny, 
let Yq be the linear element nodal basis function associated with a. For each 
fixed vertex a € Ny r,, choose €(a) € Ny,o to be an interior vertex of an 
element containing a. Let €(a) = a if a € Ny». For each node a € Ny o 
define the class I(a) = {a € N, : €(@) = a}. In this way, the set of all 
the vertices NV, is partitioned into card(N,,,9) classes of equivalence. For each 
a €N,,o set 


ael(a) 


Notice that {wq : a € N,o} is a partition of unity. Let Ka= supp(wq) and 
ha = diam(Kq). The set Ka is connected and we 4 Ya implies that I, NKa 
has a positive surface measure. 

For a given v € L'(Q), let 


7 Siqvvada 


va = ’ 
$ Sitg Pa dx 


Qe Neos (3.10) 


Then define the interpolation operator I], : V — V;, as follows: 


Tnv= > vava. (3.11) 
aeENy,0 


The next result summarizes some basic estimates for I,. Its proof can be 
found in [21]. 


Theorem 3.2. There exists an h-independent constant C > 0 such that for 
allu € V and f € L?(Q), 
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|v oa Th vli.e < C via: (3.12) 
1/2 


2 . 2 
[ fe- tv) de < Coho | YH min If salg) 


aeENv,0 
3.13) 
S> [hx (vu — Mn v)llowe < Cloli.a, (3.14) 
KEPh 
Y 1520 — Tho) IB, < Choo: (3.15) 
yEEn 


The finite element method for the variational inequality (3.5) is 
un © Va, a(n, UVn—Un) +9 (Un) —J (Un) a L(vp—Un) Vun € Va. (3.16) 


The discrete problem has a unique solution up, € V;, by the standard exis- 
tence and uniqueness result on elliptic variational inequalities. We need the 
following characterization of the finite element solution, similar to that of the 
solution of the continuous problem. 


Theorem 3.3. The unique solution up € V;, of the discrete problem (3.16) 
is characterized by the existence of An € L°(I2) such that 


a(tn, Un) +f g Anvpnds = L(vn) Vun € Va, (3.17) 
I 
|p| < 1, Anun => |u| a.e. On To. (3.18) 
Proof. Assuming (3.16), let us prove (3.17) and (3.18). Taking first v, = 0 
and then vp, = 2u, in (3.16), we obtain 
a(tn, Un) +f g |un| ds = l(un), (3.19) 
I 


Together with (3.19) the relation (3.16) leads to 


|€(vn) — a(un, Un)| < i g |vp| ds Vun © Vn. (3.20) 
2 

Write V;, = Vo @V;, where V) = V;,M Hg(@) and V;* is the orthogonal 
complement of V;, in Hp, (2). It follows from (3.20) that £(up)—a(un, vn) = 0 
Vun € V?. Notice that the trace operator from V+ onto V;{|p, C Li (12) is 
an isomorphism. Therefore, the mapping L(vu;,) = &(vpn) — a(un, Un) can be 
viewed as a linear functional on Leal T), Where Up is any element from the 
space V;, whose trace on I> is vp. It follows from (3.20) that 


|L(up)| < iA g|vp| ds Vun € Vit ln. (3.21) 
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Thus, by the Hahn—Banach theorem the functional L(v;,) can be extended to 
L(v) on L+(I) and so there exists A, € L° (I) such that 


L(v) = | Ang vu ds Vu € Li(Ib) 
I2 


and |A;,| < 1 ae. on I, from which (3.17) follows. Taking now vp, = up, in 
relation (3.17), we have 


a(un; Un) +f gAnunds = £(un), 
I2 


and using (3.19) we get 
| 9 (lun| — Anun) ds = 0. 
I> 


Because |\;,| < 1 ae. on Iy, we must have |u,| = Anup a.e. on I>. This 
completes the proof of (3.17) and (3.18). 

Conversely, assume (3.17) and (3.18) hold. It follows from relation (3.17) 
that 


a(Un, Ur _ Un) +f g An (Un = Un) ds = L(vn, _ Un) Vun € Va, 
I2 


which can be rewritten as 
a(un, Uh, — Un) +f g Anvnds -| g Anunds = L(Un a Un) Vun € Vp. 
I2 I2 
Then, relation (3.18) implies that 


a(Un; Uh — Un) +f 


gAnvn ds -f g|un| ds = (vp, — un) Vun € Vn. 
I2 I2 


Because Ap,vpn < |p| ae. on Ib, it follows immediately that up, is the solution 
of the discrete problem (3.16). 


Convergence and a priori error estimates for the finite element method 
(3.16) can be found in the literature (e.g., [34, 35]). Here, we focus on the 
derivation and analysis of a posteriori error estimators that can be used in 
the adaptive finite element solution of variational inequalities. 

In investigation of the efficiency of the a posteriori error estimators, we 
follow Verfiirth [70], with special attention paid to the inequality feature 
of the problem. The argument makes use of the canonical bubble functions 
constructed for each element K € Py, and each side y € Ep. 

Denote by Px a polynomial space associated with the element K. The 
following two theorems provide some basic properties of the bubble functions 
used to derive lower bounds. For more details on bubble functions and proofs 
see [1]. 
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Theorem 3.4. Let K € P;, and wr be its corresponding bubble function. 
Then there exists a constant C,, independent of hx, such that for any v € Pr, 


CoB < f dxv%de < Cll: 
K 


Theorem 3.5. Let K € Pp, and y € E(K) be one of its sides. Let w be 
the side bubble function corresponding to y. Then there exists a constant C, 
independent of hx, such that for any v © Px, 


Chol < f vos < Col, 
Y 


1/2 


—1/2 
hg? (by vllose + A byl < Cllellosy- 


3.3 Dual Formulation and A Posteriori Error Estimation 


We now present a dual formulation for the model elliptic variational inequal- 
ity within the framework of Theorem 3.1. The dual formulation is used in 
the derivation of a posteriori error estimators for approximate solutions. 
Let us choose the spaces, functionals, and operators needed in apply- 
ing Theorem 3.1. The space V is defined in (3.4). Let Q = (L?(@))¢ x 
ae x E* (I>). Any element gq € Q is written as q = (q),92,q3), where 
€ (L7(2))4, qo € L7(Q), and gg € L?(Iy). Let V* and Q* = (L7(2))¢ x 
12(2) x L2(T>) be the duals of V and Q placed in duality by the pair- 
ings (-,-)y and (-,-)g, respectively. Introduce a linear bounded operator 
A:V —Q by the relation 


Av = (Vv, 0, u|r,) Vue V. 


Define 
1 
Fwa=f [Fal +imP)- so} ar+ ff alalas, vev gee 
I2 


Then 
J(v) = F(v, Av) Vue V, 


and we rewrite the minimization problem (3.6) as 


ue, F(u, Au) = inf, F(v, Av). (3.22) 


To apply Theorem 3.1, we need the conjugate function 
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F*(A* q*,—q*) = Ue ee ~ (q", 1eQ _ F(v, qg}. 
ve 
qeQ 


where A* : Q* — V* is the adjoint of A. We have 


Wa 6 <a) = sup [ [ai Vo+ (a5 + Aulde + [ q3v ds 
2 


a0 2 
1 2 - l 2 * 
a gal + qi: q, }) dx — glael + 4392 } dx 
2 2 
-j (aan + alas) as} (3.23) 
2 


It can easily be verified that 
1 2 * 1 * | 2 
sup om slail + 43-9) dup = =|qi| dz, 
ne(£2(a)4 LJ \2 22 
1 2 * 1 */2 
sup 4— gldel” +9292) dep = f Slqn|'de, 
q2€L? (2) 2 2 


0, if |g3| < g ae. on Io, 
sup -| (4393 + oll) as - { = 
ort To ? co, otherwise. 


Note that the term 


sup{ f lai Vo (as + Aude + | 


vEV I2 


ijvds} 


equals oo, unless g* € Q* satisfies 
[lai vo+ Gt olact | guds=0 VveV; 
Q I> 


and under this assumption, the above term equals 0. Thus, the conjugate 
function (3.23) is 


Sox (ail? +1a)?)de, if ge Q5,, 


+00, otherwise, 78) 


where the admissible dual function set 


Qia= {a ef la-Vo+ (G4 Nodes f gude=0 Wwe, 
2 


I2 


\q3| < g ae. on nh. (3.25) 


Note that the classical form on the constraint g* € Q¥ , is 
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divgi-@=f in, 
g:v+q=0 only, 
lg) Sg on Ty. 


In conclusion, the dual problem of (3.22) is 


Pe Q5q, —F"(A"p*,—p") = AUP {-F*(A*q*,-a")}. (3.26) 
I" €Q5, 


Existence of solutions of the problems (3.22) and (3.26) is assured by Theorem 
3.1, and moreover, 
F(u, Au) = —F*(A* p*,—p*). (3.27) 


The function q+ F*(A*q*, —q*) is strictly convex over Q%,, thus a solution 
of the dual problem (3.26) is unique. 

Now let w € V be an (arbitrary) approximation of u € V, the unique 
solution of (3.5). In the rest of the section, we present a general framework 
for a posteriori estimates of the error u—w. The error bounds are computable 
from the (known) approximant w. Later on, w is taken as the finite element 
solution of the variational inequality. 

Consider 


1 1 1 
zulu —w,u-w)= sa(w,w) —a(u,w) + alu, u). 


By using (3.5) and (3.7), 


a(u— w,u—w) = -a(w,w) — a(u, w — u) 5o(u, u) 


Relation (3.27) implies 
J(u) = F(u, Au) = —F"(A*p",—p") = —F"(A"q",-@") Va" € QO}. 
Therefore, 


1 * ok * * * * * * 
salu —w,u—w) s S(w) + F*(A"g",—a") Va" = (47,93, 93) © QF.g- 


(3.28) 
Introduce a function space 


Q; = (L7(2))* x L7(Q), 


and write r* = (rj,r3) for any r* € Q*. We have 
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1 
I(w) + F*(A*g", 4") = i 5(lVw + ri? + ho + 73/2) dx 
-f (Vw- ri twr5 + fu)de+ [ g|w| ds 
Q e) 


1 * * * * 
+ f slat? Iii? + lal —aP)ae. (629) 


Because q* = (43,9, 93) € Q* 4, from (3.25), 


[i -vorguyars | g3 wds = -| fwd. 
2 I Q 
Using this relation in (3.29) and recalling (3.28), we find that 

1 

zalu —w,u—w) 


1 * * 1 * * * * 
< f S(Vw+ nih tlw + rsP)de+ f Sai — vil? + lad — ri) ae 
QQ xe] 


ae a (ar — ri) (Vw + rt) + (ae — 72 )(w + 19)] de 


+ f(g hol + aju) as 
I2 
So for any g* € Q}, and r* € QF, 
1 
salu wyu-w) < f (Vw + ri + lw +05 P) de 
22 
+f (ar-riP + las —r3P)ae+ | (oll +45) as. 
2 2 


Thus, we have established the following result. 


Theorem 3.6. Let u € V be the unique solution of (3.5), and w € V an 
approximation of u. Then the following estimate holds for any r* € Q?: 


1 
salu—wyu-w) sf (Vw + rf + lw + 05h) de 
.?) 
oe { fat — iP + os — 5) ae 
q EQ* Q 
+f (glu| +aju) ds} (3.30) 
Ia 


Consider the second term 


m= int { f dat—ri +l —rsPyar+ f (oli +aiw)as} 
aq" €Q%,, Q 


I2 
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on the right side of the estimate (3.30). From the definition (3.25), 


jib int sup { f (lai — ri? + lag —5*)ao +f (g |w| + aw) ds 
tae vev LJ T2 
q3IS9 


+f [ai Vo+ (5+ folde+ f aguas}. 
Q rs 


Here and below, the condition “|g3| < g” stands for “|g3| < g a.e. on I>.” 
Substitute gi — rj by qi, g§ —1r3> by qs, and regroup the terms to get 


I= inf sup f (lqil? + af - Vu + |a5/? + abv) dx 
la3l<g ~ 
+f [rt Vout (rh + f) ul dex 
‘77 


+f toll +a5w+o) ass 


1 
= inf sup tf, [-Fver +v°)+rt-Vu+(ri + fe dx 


lg3|S9 veV 
+f ajuds+ | (glu| +9ju) ds} 
I5 2 


Define the residual 


1 
Rigi) = sup | f ri Vote + Aolae+ | 


g3v ass . (3.31) 
T2 


I< inf sup { ~Flol + Rlajer Illy + f (go| + a5) as} 


lasl<9 veV Ta 


= int {Rog + f Colul aguas} 
2 


lasl<g 
This last estimate is combined with Theorem 3.6, leading to the next 
result. 
Theorem 3.7. Let u € V be the unique solution of (3.5), and w € V an 
approximation of u. Then for any r* = (rj,r3) € QF, 
1 


pau — w,u — w) < f (vet ri? + jw r3P) ae 
Q 


+ int {Rajrr)? + f (alul-+aju)ds}, (3.82) 
I 


la3|<9 


where the residual R(q3, r*) is defined by (3.31). 
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In the next two sections, we make particular choices of rj and r in (3.32) 
to obtain residual-based and recovery-based error estimates of finite element 
solutions of the model elliptic variational inequality. 


3.4 Residual-Based Error Estimates for the Model 
Elliptic Variational Inequality 


In (3.32), we let rj} = —Vw and r3 = —w. Then 


1: 
zatu —w,u-—w) <R, (3.33) 
where 
1 2 
R= inf [su (| [-Vw- Vout (-w 4+ f)v] ax + | divas) 
lasi<o | Lev llullv \Je Te 
1/2 
+f (g |w| + q3w) ass (3.34) 
I2 


Although it is possible to derive (3.33)—(3.34) through other approaches, 
we comment that Theorem 3.7 provides a general framework for various a 
posteriori error estimates with different choices of the auxiliary variable r*. 

In the limiting case g = 0, the problem (3.5) reduces to the variational 
equation 

ue V, a(u,v) = €(v) Vue. 


Correspondingly, the estimate (3.33)—(3.34) reduces to the familiar form 


1 
tee ere fw w)u — Vw- Vol dz, 
2 vev llvllv Jo 


which is a starting point in deriving some a posteriori error estimators for 
Galerkin approximations of linear elliptic partial differential equations (cf. 
i). 

Now we focus on a posteriori analysis for the finite element solution error, 
that is, for the situation where the approximant w = uy, is the finite element 
solution. By taking gj = —g Ap, and substituting v by uv, — v in (3.34) we 
obtain 


1 
R<sup ff un Vou) + (un ~ A) (0 en)l de 


+f om (wo) ds} (3.35) 
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for any v;, € V,. Here we have used (3.17). For a given vu € V, take vz, = Tpnv 
n (3.35), where II,v is defined in (3.11). 


1 
R< sup Ply ee [Vun - V(v — Hav) + (un — f) (v — Inv) dx 


+f gAn(v — Inv) ash 
I2 


Decompose the integrals into local contributions from each element K € P,, 
and integrate by parts over K to obtain 


Ou 
R < sup —— 2 


vev llullv D3, | ox OVK 


+f (—Aup + un — f)(v — Tpv) dx 
K 


+ i sean, 1M? = Boe) ass (3.36) 


Define interior residuals for each element K € Pp, by 


=~—(v = II,v) ds 


rq =—Au,tu,—f mk, (3.37) 


and side residuals for each side y € E€n,0,r, = Eno U En,r, by 


oe ify € € 
R, = 4 Carb ee. v (3.38) 
Sr t+ 9Xn ify € Enn, 
where the quantity 
Oup, 
ae =vVK-(Vun)K +UK: (Wun) Ke 
Yd 


represents the jump discontinuity in the approximation to the normal deriva- 
tive on the side y which separates the neighboring elements K and Kk’. By 
using definitions (3.37) and (3.38), relation (3.36) reduces to 


— II,v) dx 
< sup Se eee) 


KEPh 
+> R,(v—Upv) ds}. (3.39) 


Using the estimates (3.14) and (3.15) in (3.39), and applying the Cauchy— 
Schwarz inequality, we have 
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1 1/2 
Rs sup — 4 Clolise( Yo Akliralide+ Yo allay) 
veVv v KEPp YEEn,0, Po 
1/2 


Oy |: (3.40) 


<O( 52 pdlircle + SD AyllRy| 


KEPh YEEN,O, Pa 


We summarize the above results in the form of a theorem. 


Theorem 3.8. Let u and up, be the unique solutions of (3.5) and (3.16), 
respectively. Then the error e, = u— up, satisfies the a posteriori estimate 


lea <C[ S2 Adlirele + SD MllR IB, ), 41) 
KEPh YEE, Pa 


where rx and R, are interior and side residuals, defined by (3.37) and (3.38), 
respectively. 


In practical computations, the terms on the right side of (3.41) are re- 
grouped by writing 


len? <Cnz, = mh= So hex (3.42) 
KEPh 


where the local error indicator 7p,« on each element K, defined by 


1 
TrK = Mk llreloge + Sha Ss” Rly the = >> Ill, 
YEE(K)NEn,o YEE(K)NEn, ro 
(3.43) 
identifies contributions from each of the elements to the global error. 

In the last part of the section, we explore the efficiency of the error bound 
from (3.41) or (3.42). We derive an upper bound for the error estimator. 
Integrating by parts over each element and using (3.8) and Theorem 3.18 we 
have, for any v € V, 


a(en,v) = a(u,v) — a(un, v) 


= fode— [ gdvds— f (Gun + Vu-+ une) de 
Q Ds Q 


DS [aunt fede + y Se] ee 


KEPh YEEn,o BS 


Thus, 
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a(en,v =- of revae- > Ryuds + S> [9 (An — A)uds, 


KEP YEEn,0,Py 7 YEEn, Py 

(3.44) 
where rx and RR, are interior and side residuals defined for each element 
kK © Pp, and each side y € En,0,r, by (3.37) and (3.38), respectively. In order 
to simplify notation, we omit the subscripts K and ¥. In the following, we 
apply Theorems 3.4 and 3.5 choosing Px as the space of polynomials of degree 
less than or equal to J, and / is any integer larger than or equal to the local 
polynomial degree of the finite element functions. Let 7 be a discontinuous 
piecewise polynomial approximation to the residual r; that is, Tlk € Pr. 
Applying Theorem 3.4, we get 


lPllow < c| wT da. (3.45) 
K 
Because the function v = WF vanishes on the boundary OK, it can be 


extended to a function in V by 0 to the rest of the domain 2. Inserting this 
extended function v in the residual equation (3.44), one obtains 


a(€n, WKT) = -{ rw dx. 
K 
Using this relation, we obtain 
| OKT dx = i WKT(T a r) dx — a(€n, WKT). (3.46) 
K K 


The terms on the right side of (3.46) are bounded by making use of the 
Cauchy—Schwarz inequality and the second part of Theorem 3.4, 


i WKT 


Combined with (3.45) we have 


IFllowx < C (llr 


K llen|lasx) - 
With the aid of the triangle inequality, finally we get 


Consider now an interior side y € Ep,9. From the first part of Theorem 3.5 
it follows that 


(lF — rllose + he ). (3.47) 


0 i: wyR'ds. (3.48) 


Let 7 denote the subdomain of 2 consisting of the side 7 and the two neigh- 
bouring elements. The function v = ~,R vanishes on 0¥ and as before it can 
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be extended continuously to the whole domain 2 by 0 outside y. With this 
choice of v the residual equation (3.44) reduces to 


a(en, YR) = - [ rvyRae— fo, Rds. 
¥ y 


Therefore, 
[ eRias = —a(en, tR) - [ rb Ras. (3.49) 
Y y 


Applying the Cauchy—Schwarz inequality and the second part of Theorem 
3.5 to the terms on the right side of (3.49), we obtain 


ory + HY? Ir 


[ evBtds < chy Phealbll® oaillRllo.y; 
- 
which, combined with (3.48) and (3.47), implies that for every interior side 
7 € Eno, 

Ilo < C (A5Y?llenlass + YIP — rlloe)- (3.50) 


Finally, consider those sides 7 lying on I>. Denote R € Px an approxima- 
tion to the residual R = (Oun/Ov) + gdp, on y, 7 € E(K). The first part of 
Theorem 3.5 implies 


[FIR < cf Dee Be, (3.51) 
Y 


Define the function v = ¥,R and let 7 be the element whose boundary 
contains the side y. Then v|ay\, = 0. Extend this function to the whole 
domain by zero value outside Y. The residual equation (3.44), with this choice 
of v, becomes 


a(en, PR) = — | rp,R dx — / Ry,Rds + i gn — A)pyRads, 
a5 Y Y 


which leads to 


/ Rds = if wR(R-R) ds—a(en, vF)- | r,Relo | gAn—A) wR ds. 
y Y y Y 

(3.52) 
As before, the first three terms on the right side of (3.52) can be bounded by 
applying Theorem 3.5 and the Cauchy—Schwarz inequality. Using (3.51), we 
then obtain for each side y € En,n, 


Fl, $C (FlloglIR— Rllo-y +454? [Bllorlenllasy + 4Y/?|[Flloc lr llos) 


+ f 90 — \)pR ds. 
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Multiplying this inequality by h, and summing over all sides 7 € En,r,, we 
get 


SY) AylFGMlBy <C SS bi? [Ry llo.yh}/? [Ry — Ry llovy 


YEEn, ry YEE, ro 
+C > hy? [Ra llo-yllen lay 
YEEn, ro 
+C So AYP Rielle allrslloay + |Rawryl, (3-53) 
VEER, To 
where 


Rie = Sf gra) hysFeds. 


VEEN Ty 1 


We can bound Rp_p, as follows, 


|Rn,re| S ye, hy? \Ig (A- An)llovyha/? |lbyRyllo;y 


YEEn, Try 
1/2 1/2 
<C( >> AgllA—rall2, SY AyllR allo, 
YEEn, ro YEEn, Ty 


Use this bound in (3.53) and apply the Cauchy—Schwarz inequality to get 


So Alley S<Cl SS mI - Bulb. + SO Rllrsllb5 


EER, ry VEER, To YEEn, Ty 


+ lel + So AyllA—Anlloy }- (3-54) 


VEER, To 


Combining (3.47) and (3.54), we finally conclude that 


So allRylly SC SD byllRy- Ralloy + SO MR rsllox 


VEEN, Ty YEEn, Po VEER, Py 


+llenl + So hyllA—Aanllojy |. (3-55) 


YEEn, ro 


Summarizing (3.47), (3.50), and (3.55), we have 
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<0 | len + > AylA—Arall., 


VEER, Po 


+ So bkllrx -TRllan+ DS > AYR, - Ryle, ). (8.56) 
KEPh YEEn, ry 


For our model problem, the element residual r« of (3.37) and the side residual 
Ry, of (3.38), —Aun + un in K and Ou;,/Ov on ¥ are polynomials. Therefore, 
the terms ||rx~ —TRllo:« and ||Ry—R,|\o.7 in the right-hand side of (3.56) can 
be replaced by || f — fx|lo,« and ||An — An,»I|lo,7, with discontinuous piecewise 
polynomial approximations fx and j,,¥. 


Theorem 3.9. Let nr be defined as in (3.42). Then 


TSC | lel + So AylA—Aalloy 


YEEn, Py 


+ SPR = fla + > hallAn — Ancylld.5 (3.57) 


KEPh YEEn, ry 
with discontinuous piecewise polynomial approximations fi, An,y of f, An- 


Let us comment on the three summation terms in (3.57). As long as f has 
a suitable degree of smoothness, the approximation error }) ep, h2-\|f — 
fx \l6.« will be of higher order than |len||j-- Due to the inequality nature 
of the variational problem, in the efficiency bound (3.57) of the error es- 
timator, there are extra terms involving \ and A,. A sharp bound of the 
term eee hy||A — An|l6,, is currently an open problem. Nevertheless, in 
Section 3.6, we present numerical results showing that the presence of this 
term in (3.57) does not have an effect on the efficiency of the error estima- 
tor. Similar numerical results can also be used as an evidence that the term 
aes hy||An — Anll6-, does not have an effect on the efficiency of the 
error estimator. 


3.5 Recovery-Based Error Estimates for the Model 
Elliptic Variational Inequality 


An important class of a posteriori error estimates is based on local or global 
averaging of the gradient, for example, in the form of Zienkiewicz—Zhu gradi- 
ent recovery technique [78, 79, 80]. It is known that in the case of structured 
grids and higher regularity solutions, such estimators are both efficient and 
reliable. Some work has been done for unstructured meshes as well (e.g., 
(74, 72]). In [8, 23], Carstensen and Bartels proved that all averaging tech- 
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niques provide us with a reliable a posteriori error control for the Laplace 
equation with mixed boundary conditions on unstructured grids also. In the 
context of solving variational inequalities, gradient recovery type error esti- 
mates for elliptic obstacle problem have been derived recently in [9, 71]. 

In this section, we study a gradient recovery type error estimator for the 
finite element solution of the model problem, and we restrict our discussion 
to linear elements. Then all the nodes are also the vertices. To formulate the 
error estimator we need a gradient recovery operator. There are many types of 
gradient recovery operators. In order to have a “good” approximation of the 
true gradient Vu, a set of conditions to be satisfied by the recovery operator 
was identified in [1]. These conditions lead to a more precise characterization 
of the form of the gradient recovery operator, summarized in Lemma 4.5 in 
[1]. In particular, the recovered gradient at a node a is a linear combination 
of the values of Vuj;, in a patch surrounding a. 

We define the gradient recovery operator Gp, : V; — (Vp)@ as follows: 


1 


Gron(@)= S> Gnron(a)ga(@), — Gnvn(a) = —=— | Vundz. (3.58) 
acN, |Ka| Ka 
Linear elements are used, therefore 
Na 
Grup (a@) = So aa (Von) Ki, (3.59) 
i=l 


where (Vp) i, denotes the vector value of the gradient Vv), on the element 
Ki Rea oe ae = ee Kal t= ed Ne 

Recall from Section 3.4 that residual-type error estimates are derived by 
applying Theorem 3.7 with r* = —(Vup, un), where uy is the finite element 


solution. In this section, we consider a different choice. 


Theorem 3.10. Let u and u;, be the unique solutions of (3.5) and (3.16), 
respectively. Then 


2 2 4 2 2 : 2 
Ju— unl <OrB +6 Yo (As|Vunllg, +R min If falc) 
aeENn,o 
(3.60) 
where 


n= Yo NeK: (3.61) 
KEPp 
N@,«K =\Vur-Grurllice+ So hyllGrun-v,+9Anrlloy- (3-62) 


YEE(K)NEn, ro 


Proof. Let An € LE (I) be provided by Theorem 3.3. Apply Theorem 3.7 
with w = up, and r* = —(Grun, up) to obtain 
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5eu(t— tins te) < [vm — Gpupl? dz + R?, (3.63) 
where 
R = sup = {| [Grun: Vu + (un — fv] dx +f gAnv ass . (3.64) 
vev llullv Yo Ps 


Let II, be the interpolation operator defined by (3.11). Use (3.17) with 
Un = ITnv: 


[ (oun Vite + (un =f) Mave + f g Annu ds = 0. 
Q I2 


Therefore, we can rewrite (3.64) as 
1 
R= mat [Grun: V(u — Hyv) + (un — f)(v — Mv) dx 
ev llvllv Q 
+f gAn(v — ITpv) ds +f (Grup — Vun) -ViTpv ach, 
Ip Q 


By (3.12), we have 


0:2 SC llullv, 
and so, 
sup al (Grup = Vun) : VIT,v dx < C ||\Vun = Gruallo.a- 
vev llullv Jo 


Thus, 
R<C||Vup — 


(3.65) 


where 


Ry = sup ai | [Grun- V(u — Hyv) + (un — f)(v — Mhv)] dx 
vev |lu|lv LJ a 


+f gAn(v = ITpv) ash 
2 


Integrate by parts over each element K € Py», to get 


1 
Ry = sup —— So {| (—div(Gnun) + un — f)(v — Inv) dr 
vev llullv jp, Vx 


a + [Gv v~( Vv — II,v) ds 


yEE(K) 


a: [ore —TI,v) ash. (3.66) 


YEE(K)NEn, ry” 7 
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Because Guy, is continuous, the integrals on the interior sides y € €;,,9 cancel 
each other. Write 


—div(Gpun) + UR — f = div(Vup _— Grun) + (—Aup + Un — f) 


and rearrange the terms in (3.66) to obtain 


Ri = Se; [ av div (Vun = Grun)(v > II,v) dx 
nev av SS, 


+ 0 ff rx(e The) de 
KEPh 
4 r SO i (Grun- Vy + gAn)(v — Ipv) ds \ 
YEE, ry 
—— {J + i+ IIT}, (3.67) 
=e llellv wv 
where r~h = —Aupn+up—f = un—f denotes the interior residual on element 


K € Pp. We use r to denote the piecewise interior residual; that is, r|xh = rK 
for K € P;,. To estimate the first summand on the right side of (3.67), we 
use an elementwise inverse inequality of the form 


||div(Vup, _ Grun)|louK < Chy'||Vun — GrunlloK- (3.68) 


Apply the Cauchy—Schwarz inequality, the inverse inequality (3.68), and the 
estimate (3.14) to get 


I < Cc S> ||div(Vup, = Grup II,v 


OK 
KEPh 
<C Se [Van — Klhg (uv — Tn) |lo;« 
KEPh 
1/2 1/2 
Cc ( S> |Vun — Gr ( Dy [Ree (ue — Hal 
KEPhn KEP»p 
1/2 
Paret ee ( Sy [Yun - Gasuli] (3.69) 
KEPh 


For the second summand, we apply the estimate (3.13) to obtain 
1/2 


I< Clwhie{ So min |r “alla a: 
aeENn,,0 
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Now 


SY) ha min |Ir— ral, $2 D> hallun — Tall x, 


aENn,o aceNn,o 
+2 S72 2 min If fal; 
0;Ka’ 
aceNn,o Aa i 


where tj, denotes the integral mean of uj, over K a- Use the Poincaré inequal- 
ity and an inverse inequality of the form (3.68) to get 


SY) Aa min |Ir— ral, SC D> ralVanll x, 


aENn,o aEeNn,o 
2 A, min ||f — fall? = - 
+2 D7 he min Il — Sallo.zg 
aeEN),,0 
Therefore, 


4 2 Bais 2 
I < Clv|ize oe (*alIVuslB.c, + he min ||f - fale) : (3.70) 
h,O 


Finally, with the aid of the Cauchy—Schwarz inequality and the estimate 
(3.15), the third summand on the right side of (3.67) can be bounded by 


1/2 
I< Clio { So hyllGaua:vy+9Anllo, | - (3.71) 


VEEN, Ty 


Inserting (3.69) through (3.71) into (3.67), and using the Cauchy—Schwarz 
inequality and (3.65), we deduce that 


Ree ys |Vun — Gruallo.n + S> hy||Gnun-Yy +9 Anllo.y 


KEPp YEEn, Py 
1/2 
4 2. 2 : 2 
aos (S| Wonl cg + PB pin IF ~ Fal) (3.72) 
aENnh,o 


Split the first term on the right side of estimate (3.63) into local contributions 
from each K € P), and insert (3.72) to conclude the proof. 


We can write (3.60) as 
lu — walla < Cna + Ra, (3.73) 


where 
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1/2 1/2 


Rr=C|] S> rallVuallz,| +E] 2 ha min |If — fallo.icg 


aENn,,0 aEN,,0 


We observe that usually the term R, is of higher order compared to 
||w — un|l1;@ which is of order O(h) in the nondegenerate situations. This ob- 
servation is argued as follows. First, it is easy to show from the definition of 
the finite element solutions that there is a constant C' such that ||ua la. < C 
for any h. So the term 


1/2 
dy Ab Veal? z, 
aENn,0 
is bounded by O(h?). Next, for f € L?(), 
1/2 
> ha min If — falloz, | = o(h); 
aEN no sack nae 
and if f € H'(Q), then 
1/2 
De soi 2 = 2 
ae min If — fallo.z, | = OCR”). 


aEeNn,o 


Thus, (3.73) illustrates the reliability of the error estimator nq. 

We now turn to the efficiency of the estimator. We relate the gradient 
recovery-based estimator 7g to the residual type estimator nr. Recall that 
for the residual-type estimator nr,« is defined in (3.43) with the interior 


residual rn = —Aup + up — f in K and the side residual 
Sua if y € €, 
Rr, = barh oo 3.74 
: ee ify € Enr. ( ) 


For the error estimator nz = Oxep, Nk,K: We have the inequality (3.57) for 
its efficiency. 


Lemma 3.1. Let ng,x be defined in (3.62). Then the following bound holds: 


Nek <C So All Rallby + SS hal Ryllozy |, (3.75) 


YEE(K)NEn, ro y CER 


where Ex denotes the set of inner sides of the patch K corresponding to the 
element K. 


Proof. It follows from the definition of G;, that we have on each element Kk, 
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> pal Sra ((Vun)ic — (Vun)icg)) 


2 
|\Vup ~ Grup? = = 


aceN(K) i=1 
<C S° |(Vun) — (Vua)x?. (3.76) 
K'CK 
For any K’ Cc K, there is a sequence of inner edges 71,---;Ym, Such that 


WNW41 AS and KC 7, K’ C ym. Hence, 


\(Vun)K — (Vup) )K«| sD [[Venl-, | < a [[Venl,|- (3.77) 


EER 


Because wp, is continuous on (2, [up /Ot|, = 0 for all y € E,,0, where Ou, /Ot is 
the tangential derivative of u;. Therefore, |[Vup]+| = |[Oun/v],| if y € Eno. 
The estimates (3.76) and (3.77) together with the shape regularity of the 
partition Pp, imply 


2 
|[Vun — Grunllo.n < Chik 5 eal <C Soh fF ds. (3.78) 
y 


VEER VEER 


Let Kk € P;, be such that E(K) 9 €,,r, = @. It follows from (3.78) and 
definitions of ng,x and R, that 


Ne,K <C>) hyllRyllo.7- (3.79) 


VEER 


Consider now the case when the element K has at least one side lying on 
the boundary I. Let y € E(k) 1 En,n,. Apply the triangle inequality to get 


N@,K =\Vur—Grunllice + D> hyllGnrun- vy + grrlloy 

YEE (K)NEn ry 

< | Vun — Grunllo« 

ae oS hy (|(Vun — Gaur)» Yyllosy + I Ryllos)” 

YEE(K)NEn, ro 

< ||Vun — Gauallo.x 

+ Shy ([I(Vun = Gnun) V4 By + RyRy) (8-80) 

VEE(K)NEn, Ty 


From an inverse inequality and (3.78), we have 


hy||(Vun — Gaur) Yylloy < hx ||lVur -— Grualli.ar 


< Ol|Vun — Grunllie SC DD heyyy (381) 


oy €Ex 
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Inserting (3.78) and (3.81) into (3.80) concludes the proof. 


From Lemma 3.1 and the inequality (3.57) we obtain 


ne SC | lu-uall + SE byllA—Anlloey 


YEEn, ro 


+ So PKS fell + D> ByllAn — Anal, (3.82) 


KEPh YEEn, ry 


with discontinuous piecewise polynomial approximations fx and Ap,y. The 
comments at the end of Section 3.4 apply to the three summation terms in 
(3.82). 


3.6 Numerical Example on the Model Elliptic 
Variational Inequality 


In this section, we provide some numerical results on a two-dimensional ellip- 
tic variational inequality to illustrate the effectiveness of the error estimators 
nr of (3.42)-(3.43) and nq of (3.61)-(3.62). We use triangular partition- 
ing and linear elements for discretization, and a seven-point Gauss—Legendre 
quadrature to compute the load vector on each triangle. Numerical integra- 
tion over a general triangle is done by the reference element technique. On 
the reference element 


K ={(€,n):€>0, n>0, 1—-€-n > 0}, 


the seven-point Gauss—Legendre quadrature formula is defined by 
7 
[Pe madgdn = So eFEam) 
i=1 


where the nodes {(€;,7)}/_, and weights {w;}7_, are given in Table 3.1. 
The discretized solution is computed by solving the equivalent minimization 
problem using an overrelaxation method with a relative error tolerance, in 
the maximum norm, of 10~° (see [34, 35]). 

In order to show the effectiveness of the adaptive procedure we compare 
numerical convergence orders of the approximate solutions. We compute these 
orders by considering families of uniform and adaptively refined partitions. 
Consider a sequence of finite element solutions uf?” based on uniform par- 
titions of the domain 2. Starting with an initial coarse partition P 1, we 
construct a family of nested meshes by subdividing each element into four 
congruent elements for the two-dimensional case. The solution from the most 
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Table 3.1 Nodes and weights of a quadrature over reference triangle 


6 


(6 — V15)/21 |(9 + 2V/15)/21](155 — V/15)/240 


refined mesh is taken as the “true” solution wu used to compute the errors of 
the approximate solutions obtained on the other meshes. 
Adaptive finite element solutions are obtained by the following algorithm. 


1. Start with the initial partition P;, and corresponding finite element sub- 

space Vj. 

Compute the finite element solution use E Vp. 

3. For each element K € P;, compute the error estimator nx, defined in 
(3.43) for the residual type and (3.62) for the gradient recovery type. 

4. Let n = >-xep, MK with N being the number of elements in partition 
Py». An element K is marked for refinement if 7x > «7, where p is a 
prescribed threshold. In the example of this section, p = 0.5. 

5. Perform refinement and obtain a new triangulation P),. 

6. Return to step 2. 


© 


In the computation of the error indicator 7x we make use of the multiplier 
An defined on In C I’. In what follows we describe how A;, can be (approx- 
imately) recovered from the solution up, using characterization (3.17). We 
compute the piecewise constant and the piecewise linear approximations to 
the Lagrange multiplier. 

Denote by {a'}”™, the nodes of the partition P;, belonging to Ty. Let 
{y;}™, be the basis functions corresponding to the nodes {a’}. Let {yi}, 
be the characteristic functions of the intervals {K;} belonging to Ty and 
defined as follows: K; is the intersection of I and the segment joining the 
midpoints of edges sharing a’ as a common point. We first determine a piece- 


wise constant function 
m 
(0) _ Oi. 
Mi = nix 
i=l 


or a piecewise linear function 


by requiring an analogue of (3.17): 
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a(un; Un) +f g Ml onds = (up) Von € Vp (3.83) 
2 


with k = 0 (piecewise constant) or k = 1 (piecewise linear). Denote ae = 
(nts secon ee k = 0,1. We then project the components of ri") onto the 


interval [—1, 1] to get i= = Ons cere Nee 


Ap’y = max{min{A,",,1},-1}, i= 1,...,m. 


The piecewise constant approximation 0) and the piecewise linear approx- 


imation ee of the multiplier A; on I) C I’ can be computed as 
A= s Aaxe “and Ay) = 3 Apoee (3.84) 
i=1 


We briefly comment on the method for finding a 4k = 0,1. Let n = 
dim V;,. Denote by K the standard (n x n) stiffness matrix and by | € R” 
the standard load vector. Let wu € R” be the nodal value vector of the finite 
element solution u;,. Then the algebraic representation of (3.83) becomes 


(Ku, v)pn + (GMAW) ve)am = (Lv)pn = VU ER", (3.85) 


where v, denotes the subvector of v, containing the nodal values of vp, at 
the nodes {a’}”™, C I) and M is a sparse (m x m) matrix. We can write 


v= (vi, or. € € R= ™ x R™ by assuming that the components of vu, are 


listed last. We similarly split [ to 1; and l.. This decomposition yields a block 
structure for K, 
Ki Kic 
as & a , 
Then (3.85) is equivalent to the following two relations: 
Kiiu;i + Kicte = li, 
K iui t+ Keotc + gM) = Ie. 


Once the approximate solution u;, is computed, we can obtain from the second 
relation that 


dM) = g MMI, — Keiti—Kecte), k= 0,1. 


We use ui” for finite element solutions on uniform meshes, and u%@ for 
finite element solutions on adaptive meshes. We find that uses of piecewise 
linears and piecewise constants for the Lagrange multiplier lead to negligi- 
ble differences in the adaptive meshes. For instance, in Example 3.1, we get 
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identical adaptive meshes for the first three adaptive steps, and there is a 
difference of only one or two nodes for the fourth and fifth adaptive steps. 
Thus, in all the numerical examples in this chapter, u%@ refers to the adaptive 
finite element solution where piecewise linear functions are used for approx- 
imating the Lagrange multiplier in generating the adaptive mesh. Because 
adaptive solutions are involved, numerical solution errors are plotted against 
the number of degrees of freedom, rather than the meshsize. 


Example 3.1. Let 2 = [0,1] x [0,1] and Iz = I’. We solve the problem of 
finding u € H+(22) such that Vu € H1(9), 


[a Ve-w+u(w—wlderg f jolas—g f ujas> f fw—wyae 
where 


f =—-Aw+u, 


W = Wi — Wa, 


and for i = 1,2, 


oy fexp(/(r?-1)) if re <1, 
wile) = . otherwise 


with rj = [(21 —2{)? +(x — ah)? fey. For the numerical results reported 


below, we let g = 1, 2{!} = 0.8, 2) = 0.1, af = 0.3, ef) = 0.1, a1 = 0.25, 
and €9 = 0.2. 

We start with a coarse uniform triangulation shown on the left plot in 
Figure 3.2. Here, the interval [0, 1] is divided into 1/h equal parts with h = 
1/4 which is successively halved. The numerical solution corresponding to 
h = 1/256 (66,049 nodes) is taken as the “true” solution u, shown in Figure 
3.1. 

We use the regular refinement technique (red-blue-green refinement), in 
which the triangle is divided into four triangles by joining the midpoints of 
edges and adjacent triangles are refined in order to avoid hanging nodes. For 
a detailed description of this and other refinement techniques currently used 
see, for example, [70] and references therein. Also, in order to improve the 
quality of triangulation, a smoothing procedure is used after each refinement. 
For each triangle K of the triangulation we compute the triangle quality 
measure defined by 
Q(K) = 4y/3 area(K) 

AEH +h}? 
where h;, 1 = 1, 2,3, are the side lengths of the triangle K. Note that Q(K) = 


1 if hy = ho = hg. A triangle is viewed to be of acceptable quality if Q > 0.6, 
otherwise we modify the mesh by moving the interior nodes toward the center 
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Solution 


Fig. 3.1 “True” solution. 
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of mass of the polygon formed by the adjacent triangles. The adaptively 
refined triangulation after five iterations is shown on the right plot of Figure 


3.2. 


To have an idea of the convergence behaviour of the discrete Lagrange 
multipliers, we analyze the errors ||AW) — M™ llor, j,k = 0,1 corresponding 
to the sequence of uniform refinements. Here, \ and A“) are the piecewise 
constant and the piecewise linear approximations to the Lagrange multiplier 
corresponding to the parameter h = 1/256. Graphs of At and a with 


o.9+ 4 0.9 
o.8b 4 0.8 
O.7F 4 0.7 
0.6+ 4 0.6 
o.5 o.5 
0.47 4 0.4 
o.3b 4 0.3 
o.2+ 4 0.2 
O.1 fF 4 o.1 

°o 0.2 0.4 0.6 0.8 1 a 


Fig. 3.2 Initial and adaptive 


y refined partitions. 


0.2 


0.4 


0.6 0.8 41 
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Side: [0,1}<{0} Side: {1}{0,1] 


Side: [0,1}<{1} | Side: (0}«(0,1] 


Fig. 3.3 Plots of \{"). 


h = 1/256 are provided in Figures 3.3 and 3.4. Note the peaks of AS around 
x, = 0.1 and x; = 0.5 (upper left plot), and x2 = 0.3 (upper right plot), which 


are eliminated by the projection of oe onto the interval [—1, 1]. Figures 3.5 


Side: [0,1}<{0} Side: {1}{0,1] 


Side: [0,1}<{1} | Side: (0}«(0,1] 
~0.005 


-0.01 


0.015 


-0.02 


-0.025 -0.1 
0 1) 


Fig. 3.4 Plots of \{1). 
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1.5 


Side: [0,1}<{0} 


0.5 


1) 0.2 0.4 0.6 0.8 


0.2 0.4 0.6 0.8 


Fig. 3.5 Plots of \(°). 
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and 3.6 contain the graphs of te and rs corresponding to h = 1/256. 


Figure 3.7 provides the error values ||u—u!!”||1,.9 and h!/?||\@) — \) llo.r; 


j,k = 0,1. The numerical convergence orders of h1/?||\@ — MM lor are 


1.5 


Side: [0,1}<{0} 


Fig. 3.6 Plots of \{°). 
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—= solution 

—*- lambda (j=0, k=0) 
x: lambda (j=0, k=1) 
©: lambda (j=1, k=0) 
—A— lambda (j=1, k=1) 


Error 
3 
T 


10 L Ll 
1 2 3 4 


10 10 10 10 
Number of degrees of freedom 


Fig. 3.7 Errors ||u— u}”||1,9 (G) versus ni/2||,G) — MM lowe (j,k = 0,1). 


obviously higher than that of ||u — wu?” ||1.9, indicating that the second term 
within the parentheses in the efficiency bounds (3.57) and (3.82) is expected 
to be of higher order compared to the first term |len||7. 0. 

We use an adaptive procedure based on both residual type and gradient 
recovery type estimates to obtain a sequence of approximate solutions unt, 
The adaptive finite element mesh after five adaptive iterations is shown on 
the right plot in Figure 3.2. Figures 3.8 and 3.9 contain the error values 
ju — ue" \|q.09 and |lu — u%4||1.0. We observe a substantial improvement of 
the efficiency using adaptively refined meshes. Figures 3.10 and 3.11 provide 
the values of nr = (0x N7.x)'/?, 1E{R, G}, where nr,« are computed using 
either residual type estimator (I = R) or gradient recovery type estimator 
(J = G) on both uniform and adapted meshes. Table 3.2 contains the values 
of C; computed for uniform and adaptive solutions: 


Ge! a Petriey. 


YT 


Table 3.2 Numerical values of Cr and Cg 
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—& uniform mesh 
—4- adapted mesh 


Error 


40 f f 
10! 10° 10° 10° 


Number of degrees of freedom 


Fig. 3.8 Results based on residual type estimator. 


It is seen from Table 3.2 that for this numerical example, the gradient 
recovery type error estimator provides a better prediction of the true error 
than the residual type error estimator, a phenomenon observed in numerous 


—& uniform mesh 
—-4- adapted mesh 


Error 


10 Ll 1 
10" 10° 10° 10° 
Number of degrees of freedom 


Fig. 3.9 Results based on recovery type estimator. 
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10 T 
-= uniform mesh error 
8. uniform mesh nq 
-/- adapted mesh error 
- adapted mesh nr 
to" F | 
£ 10° 4 
ui 
10" F 4 
10° 1 1 
10" 10° 10° 10° 


Number of degrees of freedom 


Fig. 3.10 Results based on residual type estimator. 


references. In general, we use the a posteriori error estimates only for the 
purpose of designing adaptive meshes, due to the presence of the unknown 
constants Cr and Ca. 


10 T 
= uniform mesh error 
4 uniform mesh ng 
-/\- adapted mesh error 
A. adapted mesh ng 
10° - | 
$ 
wo 
10" F | 
7 
10° 1 1 
10" 10° 10° 10° 


Number of degrees of freedom 


Fig. 3.11 Results based on recovery type estimator. 
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10 I 
—= adapted mesh (residual) 
—A- adapted mesh (gradient recovery) 


Error 


10 L 1 
10' 10° 10° 10° 


Number of degrees of freedom 


Fig. 3.12 Performance comparison of the two error estimators. 


For comparison of the performance between the two error estimators, we 
show in Figure 3.12 the errors of the adaptive solutions corresponding to the 
two error estimators. We observe that the two error estimators lead to very 
similar solution accuracy for same amount of degrees of freedom. 


More numerical results can be found in [14]. 


3.7 Application to a Frictional Contact Problem 


In this section, we take a frictional contact problem as an example to show 
that similar a posteriori error estimates can be derived for more complicated 
variational inequalities. Again, we denote by 2 C R@ (d < 3 in applications) 
a Lipschitz domain with boundary I’. The outward unit normal exists a.e. on 
I and is denoted by v. We use S? for the space of second-order symmetric 
tensors on R?. The canonical inner products and corresponding norms on R4@ 
and S?¢ are 


UV = Ui, |v] = (v- v)!/? Vu, ve RY, 


OE = O45; lo| =(a:0)'/? Vo, €€ 8%. 


Here, the indices 7 and 7 run between 1 and d, and the summation convention 
over repeated indices is used. We define the product spaces L?(w) := (L?(w))4 
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and H*(w) := (H'(w))4 equipped with the norms |j»||z,,, = pall Wills 
k = 0,1. When no ambiguity occurs, we use the same notation v to denote 
the function and its trace on the boundary. For a vector v, we use its normal 
component v, = v-v and tangential component v, = v—v,v at a point on the 
boundary. Similarly for a tensor o € S%, we define its normal component o,, = 
ov -v and tangential component 0, = ov — o,v. For a detailed treatment 
of traces for vector and tensor fields in contact problems and related spaces 
see [53] or [46]. 

The material occupying 2 is assumed linearly elastic. We denote by C : 
Q x S4 — S? the elasticity tensor of the material. The fourth-order tensor C 
is assumed to be bounded, symmetric, and positive definite in 22. 

We briefly describe the physical setting of the frictional contact problem. 
Details and other related problems can be found in [53, 46]. Consider an 
elastic body occupying a bounded domain Q in R?, d < 3, with a Lipschitz 
boundary. The boundary I is partitioned as follows: ! = p Uy UT with 
I'p, Ty, and Ig relatively open and mutually disjoint, and meas(I‘p) > 0. 
The subscripts “D”, “N”, and “C” are intended as shorthand indications for 
Dirichlet, Neumann, and contact boundary conditions. We assume that the 
body is clamped on J’p; on the boundary part Iy surface tractions of density 
f. € (L7(Iy))¢ are applied and on Ig the body is in bilateral contact with 
a rigid foundation. The contact is frictional and is modeled by Tresca’s law. 
Volume forces of density f , € (Z?())¢ act in Q. 

In classical formulation, the problem is to find a displacement field wu : 
Q — R¢ and a stress field o : 2 — S4 such that 


o = Ce(u) in Q, 3.86) 

e(u) = 5(Vu+ (Vu)*) in Q, 3.87) 

Divo +f,=0 in Q, 3.88) 

u-—0 on Ip, 3.89) 

ov=f, on Ty, 3.90) 

Uy =0 on Ic, 3.91) 

lor; <9 on Ig, 3.92) 

lo;,|<g > u, =0 on Ic, 3.93) 

lo-|=g > u, =—Ko, forsomek>0 - onlg, 3.94) 


where the friction bound g > 0 on Ic. We comment that (3.86) is the 
constitutive relation of the linearized elasticity material, (3.87) defines the 
linearized strain and the displacement, and (3.88) is the equilibrium equa- 
tion. The classical displacement and traction boundary conditions are given 
in (3.89) and (3.90). Contact conditions are described in (3.91)—(3.94). The 
bilateral contact feature is reflected by (3.91). The relations (3.92)—(3.94) 
represent Tresca’s friction law. 
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In certain situations, the frictional contact problem stated above describes 
the material deformation quite accurately. In more complicated situations, 
such as when the contact zone is not prescribed a priori or when more realistic 
frictional contact laws are used, the frictional contact problem here can be 
viewed as an intermediate problem for a typical step in an iterative solution 
procedure for solving the more complicated contact problem. 

To introduce a variational formulation of the problem, we let 


V={veH'(2): vr, =9, v |r. = 0} 


with its inner product and norm defined by 
(u,)v = f e(u)e(v)de, —[vllv = (v, 9)Y 


Because meas(I‘p) > 0, the Korn inequality holds and we sce that ||v||v is 
a norm over V which is equivalent to the canonical norm ||v||1,9. Over the 
space V, we define 


aay i Ct sea ae (3.95) 
tw) = f frvde+ ne Ue (3.96) 
io) =f aler|as (3.97) 


A standard procedure leads to the following variational formulation of the 
problem (3.86)—(3.94): 


wc V, a(u,v—u)+j(v) — j(u) > (uv — u) YVvoeV. (3.98) 


The bilinear form a(-,-) : Vx V — R is obviously continuous. Due to the 
assumption meas(I‘p) > 0, it is V-elliptic. The functional 0: V — R is 
linear and continuous, and j : V — R is proper and lower semicontinuous 
convex. Therefore, the variational inequality (3.98) has a unique solution u 
in V. Moreover, because the bilinear form a(-,-) is symmetric and positive 
definite, solving the variational inequality (3.98) is equivalent to minimizing 
the energy functional 


Iw) = Sav, ») — &(v) +5(0) 


over the space V. The unique solution u € V of the problem (3.98) is char- 
acterized by the existence of a Lagrange multiplier A, € (L°°(Ic))? such 
that 
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a(t, v) +f gAr + v7 ds = £(v) Vue V, (3.99) 
Ic 
|A;| <1, Ar: up = |u| a.e. on Ig. (3.100) 


Turn now to a finite element approximation of the problem. Assume 92 has 
a polyhedral boundary I’. Let {P),} be a family of partitions of the domain 
2 into straight-sided elements, and let Vz, C V be the associated standard 
finite element spaces of continuous piecewise polynomials of certain degree. 
Corresponding to the partition P;,, we use the symbols hx, hy, E(K), En, En,o 
as introduced in Section 3.2. In addition, we use €n,r,, En,rg with obvious 
meanings. The discrete formulation of the variational inequality (3.98) reads: 
find uy, € Vz, such that 


a(un, Un — Un) + j(vp) _ j(un) > L(vn _ Un) Vupn € Vp. (3.101) 


Like the continuous variational inequality (3.98), the discrete problem (3.101) 
has a unique solution u, € V), and it is characterized by the existence of 
Anr € (E°(re))% such that 


a(un, Un) +f GAnhr * Var ds = L(vn) Vun € Vn, (3.102) 
Ic 
IAnc| <1, Anz: Une = |Unr| a.e. on Ig, (3.103) 


where vz, and up, denote the tangential components of vp, and up, respec- 
tively. Analysis of the finite element approximation of such problems in the 
general context of variational inequalities is extensively discussed in [34, 35]. 
In the context of finite element approximations of a problem more general 
than the one considered in this chapter, one can find in [46, Section 8.2] an 
optimal-order a priori error estimate under additional solution regularity, and 
a convergence result without any additional solution regularity assumption. 

Similar to the result in Section 3.4, we have the residual type error estimate 


ley <Cnz, nr= >. tx 
KEP, 
where 


i 
NRK = hicllrellox + 5 he S> | Ry Ils.- 
YEE(K)NEn,o 


+hr S> || Ry Io. + hx S> |RyIl6,5 


YEE(K)NEn, ry YEE(K)NEn, re 


and rx and R, are the interior and side residuals, respectively, defined by 
(on, = Ce(up)): 
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rx =Divont+fy in kK € Pp, 
— [on], if y € Eno, 
R,= 4 fo-—onv ify € Enry, 


—gAnr —Onr if ¥ € En,re- 


Moreover, 


TSC | llelly + So hyllAr — Anrlld,, 


YEEn.TG 


+ So bkllfi—fiellaxe t+ D5 byllAne — Anrallé,y 


KEP, VEER. Co 


with discontinuous piecewise polynomial approximations f ; «, Anz, of resid- 
uals f 1, Anz, respectively. 

The results in Section 3.5 on recovery-based error estimate can be simi- 
larly extended to the finite element solution of the frictional contact problem 
(3.98). 

We now present numerical results on two two-dimensional problems. In 
both examples, body forces are assumed to be negligible and the body is 
in bilateral frictional contact with a rigid foundation on the part Ic. The 
friction is modeled by Tresca’s law with a given slip bound g. We use the 
adaptive algorithm stated at the beginning of Section 3.6, with y= 1. 


Example 3.2. The physical setting of this example shown in Figure 3.13. 
The domain (2 = (0,4) x (0,4) is the cross-section of a three-dimensional 
linearly elastic body and plane stress condition is assumed. On the part Ip = 
{4} x (0, 4) the body is clamped. Oblique tractions act on the part {0} x (0, 4) 
and the part (0,4) x {4} is traction free. Thus Ivy = ({0} x (0, 4)) U ((0, 4) x 
{4}). The contact part of the boundary is ¢ = (0,4) x {O}. 


Wages 


rigid obstacle 


Fig. 3.13 Problem setting for Example 3.2. 
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Fig. 3.14 Initial mesh. 


The elasticity tensor C satisfies 


V E a 
Bei tee) 0g a ia ee 


E 
CC ea Ba l+vp 


where F is the Young’s modulus, v is the Poisson’s ratio of the material, and 
6;; is the Kronecker symbol. We use the following data (the unit daN/mm? 
stands for “decanewtons per square millimeter” ): 


E = 1500 daN/mm’?, 
vy = 0.4, 
f= (0,0) daN/mm?, 
f (v1, 22) = (150(5 — x2), -75) daN/mm?, 
g = 450 daN/mm?. 


The initial uniform triangulation P, (128 elements, 81 nodes) is shown 
in Figure 3.14 with the interval [0,4] being divided into 4/h equal parts, 
h = 1/2. Then the triangulation is successively halved, and the numerical 
solution corresponding to h = 1/64 is taken as the “true” solution wu. To 
have an idea of the convergence behavior of the discrete Lagrange multipliers, 
we compute the errors ||A; — Anrilo.r. corresponding to the sequence of 
uniform refinements. Here, A, is the Lagrange multiplier corresponding to 
the parameter h = 1/64. Figure 3.15 provides a comparison of the errors 
|| — ui”||y and h'/?\|A, — Ancllo.rc. The numerical convergence order of 
h/2\\\, —Ancllo.re is obviously higher than that of ||u—u¥"||y, indicating 
that the second term in the efficiency bound is expected to be of higher order 
compared to the first term ||w—uj;||%-. Note that in the two-dimensional case, 
A; and Anz are scalar-valued functions, also denoted by » and »;,. The (ap- 
proximate) calculation of \;, follows the procedure described in Section 3.6. 
Graphs of Ap,,1 and Ap,2 with h = 1/64 are provided in Figures 3.16 and 3.17. 
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Fig. 3.15 Errors ||u — u}”||y (CG) versus h/2l|\nr — Ant lloro (A). 


We use an adaptive procedure based on both residual type and recovery 
type estimates to obtain a sequence of approximate solutions ued, The de- 
formed configuration and the adaptive finite element mesh after four adaptive 
iterations are shown in Figures 3.18 (based on the residual type estimator, 
5583 elements, 2921 nodes) and 3.19 (based on the recovery type estimator, 
5437 elements, 2832 nodes). Figures 3.20 and 3.21 contain the error values 
||u — ui" ||y and ||/u — u%4||y. We observe a substantial improvement of the 
efficiency using adaptively refined meshes. Figures 3.22 and 3.23 provide the 
values of nr = Or nix), Ie {R,G}, where n7,~ are computed using 
either residual type estimator (J = R) or recovery type estimator (I = G) 
on both uniform and adapted meshes. Table 3.3 contains the values of Cy 
computed for uniform and adaptive solutions: 


Gee A Te{R,G}. (3.104) 
I 


Table 3.3 Numerical values of Cr and Cg 


7 Ae-04]6, 906-04 
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Fig. 3.16 Plot of An 1. 


Fig. 3.17 Plot of An 2. 


Additional numerical experiments have been carried out in order to show 
the influence of the discrete Lagrange multipliers A, on the adaptive solution. 
To this end, we associate the residuals R corresponding to the sides y € 
En,ro With a weighting parameter O: 


Dy she 2 
TR,e = S TR,K,O> 
KEPp 


1 
RKO = hillr«llo.x + shx | Ry Il6,- 
2 
YEE(K)NEn,o 
+hx S> | Ry| Oey 


YEE(K)NEn, ry 


+@hx >> IRI 


YEE(K)NEn re 


2 
Ov" 
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Fig. 3.19 Deformed configuration based on recovery type estimator. 
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Fig. 3.20 Residual type estimator: ||u — u?”||y ( 


) versus |/u — u24||y (A). 
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Fig. 3.21 Recovery type estimator: ||u — u?”||y ( 


) versus ||u — w?4||y (A). 
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Fig. 3.22 Results based on residual type estimator. 
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Fig. 3.23 Results based on recovery type estimator. 
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Table 3.4 Residual type estimator. Influence of Lagrange multiplier part on adaptive 


solution 


O= ds 
adaptive love off 


[ue 


0=0.1 


[ea 


0 =0.01 
ae IE 


We observe that the smaller the parameter O is, the less influence there 
is from the Lagrange multiplier part on the error estimator and hence on 
the adaptive mesh. An adaptive procedure is performed based on the error 
indicators nr,e with O = 1,0.1,0.01. Numerical results are summarized in 
Table 3.4. Similar numerical results are obtained for the gradient recovery 
type estimator. 


Example 3.3. The physical setting of this example is shown in Figure 3.24. 
The domain 2 = (0,10) x (0,2) is the cross-section of a three-dimensional 


linearly elastic body with the plane stress condition assumed. On the part 
Ip = (0,10) x {2} the body is clamped. Horizontal tractions act on the part 
{0} x (0,2) and oblique tractions on {10} x (0,2). Here I = (0,10) x {0}. 


LULL LLLL 


ii Q roe 


_ ae 


T 


rigid obstacle 


Fig. 3.24 Problem setting for Example 3.3. 
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Fig. 3.25 Initial mesh. 


The following data are used: 


E = 1000 daN/mm?, 
vy = 0.3, 
f= (0,0) daN/mm’, 


_ f (500,0) daN/mm? on {0} x (0,2), 
F2(%1,2) = 4 (950n5 — 750,—100) daN/mm? on {10} x (0,2), 


g = 175 daN/mm’?. 


We start with a coarse uniform triangulation shown in Figure 3.25, with 
160 triangular elements and 105 nodes. For uniform triangulations, we divide 
the interval [0,10] into 10/h equal parts and interval [0,2] into 4/h parts. 
The numerical solution corresponding to h = 1/64 is taken as the “true” so- 
lution u. Figures 3.26 (based on the residual type estimator, 5222 elements, 
2755 nodes) and 3.27 (based on the recovery type estimator, 4964 elements, 
2624 nodes) show the approximate solution and refined mesh after four con- 
secutive refinements. Graphs of Ap; and A;,,2 with h = 1/64 are provided in 
Figures 3.28 and 3.29. Again, we compute the errors ||u— ui” || v, ||w—_ut4||v, 
A? \X, —Anrllo:re, and nr, 1€ {R, G}, whose values are provided in Figures 
3.30-3.34. Table 3.5 contains the values of C7, 1€{R,G} defined in (3.104). 


Table 3.5 Numerical values of Cr and Cg 
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Fig. 3.26 Deformed configuration based on residual type estimator. 
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Fig. 3.27 Deformed configuration based on recovery type estimator. 
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Fig. 3.28 Plot of An 1. 


Fig. 3.29 Plot of Ap 2. 
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Fig. 3.30 Residual type estimator: ||u — ui/”||y (CQ) versus ||u — u24||y (A). 
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Fig. 3.31 Recovery type estimator: ||u — ut” || y (G) versus ||u — u24||y (A). 
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Fig. 3.32 |/u—uf”||y (CO) versus h/?\|Xnz — Apr llosre (A): 
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Fig. 3.33 Results based on residual type estimator. 
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Fig. 3.34 Results based on recovery type estimator. 
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3.8 Quasistatic Variational Inequalities and Their 
Discretizations 


The discussions in the previous sections on steady-state problems can be ex- 
tended in adaptive solution of time-dependent variational inequalities. We 
illustrate this on some quasistatic variational inequalities. We first introduce 
an abstract quasistatic variational inequality. As is seen below, this varia- 
tional inequality covers several quasistatic contact problems. A similar qua- 
sistatic variational inequality also arises in the study of a primal formulation 
of some plasticity problems [42]. 

Let X be a real Hilbert space with inner product (-,-), and norm || - |x, 
T >0,and1<p<oo. Assume a: X x X — R is a symmetric, continuous, 
coercive bilinear form, 7 : X — R is a continuous semi-norm X. Let there be 
given f € W!(0,T; X) and up € X with the condition 


a(uo,v) + j(v) > (f(0), v)x Vue X. 


We consider the following variational inequality. 


Problem 3.1. Find u: [0,T] — X such that for a.e. ¢ € (0,7), 
a(u(t),v — u(t) + j(v) — j(u(t)) 2 (F(t),v—u(t))x Vue X, (3.105) 


and 
u(0) = uo. (3.106) 


A proof of the following well-posedness result can be found in [46]. 


Theorem 3.11. Under the stated assumptions on the data, Problem 3.1 has 
a unique solution ue Wh?(0,T;X). 


Next, we consider a semidiscrete scheme for Problem 3.1. We need a par- 
tition of the time interval: [0,7] = es Eerie with 0 =t) < tj) <--+-< 
ty = T. Denote ky = tn — tn—1 for the length of the subinterval [tn—1, tn], 
and k = max, k, for the maximal step-size. For the given linear functional 
f © W' (0,7; X) and the solution u € W!?(0,T;.X), we use the notations 
fn = f(tn) and un = u(t), which are well-defined by the Sobolev embedding 
W1?(0,T;X) — C([0, T]; X). The symbol 6,un = (Un —Un—1)/kn is used to 
denote the backward divided difference. 

Then a semidiscrete approximation of Problem 3.1 is the following. 


Problem 3.2. Find u* = {uk}*_, Cc X such that 
uk = uo (3.107) 
and for n = 1,2,...,N 


2: 


a(ur, v— bnuk ) + j(v) - j(bnux) > (fr v- dnuk) x Yue xX. (3.108) 
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We notice that under the assumptions of Theorem 3.11, Problem 3.2 has a 
unique solution. Let us briefly indicate how an error estimate can be derived 
for the semidiscrete solution defined in Problem 3.2. For this purpose, we 
assume u € H?(0,7;X). Note that this assumption in particular implies 
that the solution u satisfies the inequality (3.105) for all t € (0,7). 

Denote €, = Un — uk for the solution error, and let 


lola = a(v, 0)? 


be the “energy” norm over X. We observe that from the assumptions on the 
bilinear form a, the quantity ||v||, defines an equivalent norm on X. We take 
v = bnuk in (3.105) at t = tn, v = tn in (3.108), and add the two inequalities 
to obtain 

(tn, nuk — tin) + a(uk, tin — Onu®) > 0. 


This relation can be rewritten as 

A(E€n,€n — Cn—-1) < kna(€n, OnUn — tin). 
Now, 
(llenllZ — llen—alla) . 


(llenll2 + |[Sn%n — ttn ||2) : 


alen, En — €n—1) Ps 


NlRwele 


A(En, OnUn — Un) < 
Hence, 
llenll < (1 — Fn) *len—alla + Kn (1 — Kn) * [Sn tin — tena. 
It is easy to verify that when k < 1/2, 
(Ak eee, 


Therefore, 


llenll2 ae PA eccalls + hen "* |[Onitin om tin||2- 
An inductive argument leads to 


n—-1 
llen||2 < e2 Gato +) leg ll? + S- kin je Conteh V5, ttn 5 aap, 


j=0 
Because eg = 0 and ky, +--+: +kn—; < T, we have 


n-1 


llenlla S¢ > Fn—yllin—jttn—j — tn —ylla- 
j=0 


Under the smoothness assumption u € H?(0,T; X), we have 
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2 “1/2 
[[On—jtn—j — tn slla S Kr sll ll E2 (6,51 .te—93X)* 


Thus, we have shown the error estimate 
N 


ma hin — wh SOD Flea ae (3.109) 
j= 
It then follows that 


max [lum — hil < ck lllz2(0,r.) 
Note that the estimate (3.109) identifies the local contribution from each 
subinterval to the solution error, and therefore could be used to adjust the 
values of k; to achieve an efficient time discretization. 

Now we describe some contact problems that lead to a variational in- 
equality of the type stated in Problem 3.1. We consider a quasistatic con- 
tact process of a linearly elastic body occupying a Lipschitz domain 2 
in R? (d < 3 in applications). The boundary = 02 is partitioned as 
follows: = Ip Uy UT with Ip, I'v, and I'¢ relatively open and mutu- 
ally disjoint, and meas(I’p) > 0. The time interval of interest is (0,7). We 
assume that the body is clamped on Ip x (0,7), meaning that the body 
is clamped on Ip during the time interval (0,7); a volume force of den- 
sity f, acts in 2 x (0,7) and a surface traction of density f., is applied on 
Ty x (0,7). Then the displacement field u : 2 x [0,7] — R¢@ and the stress 
field o : 2 x [0,T] — S¢ satisfy the relations: 


o = Ce(u) in 2 x (0,T), (3.110) 

e(u) = 5(Vu+(Vu)’) in 2x (0,7), (3.111) 
Divo +f,=0 in 2x (0,7), (3.112) 
u=0 on Ip x (0,7), (3.113) 
ov=f, on I'y x (0,T), (3.114) 

u(0) = uo in 2 (3.115) 


In (3.110), C : Q x S4 — S? denotes the fourth-order elasticity tensor of the 
material, and is assumed to be bounded, symmetric, and positive definite in 
2. In (3.115), uo is the given initial displacement field. In (3.114), v is the 
unit outward normal vector on the boundary I’, which exists a.e. because I’ 
is assumed to be Lipschitz continuous. 

The relations (3.110)—(3.115) are to be supplemented by contact conditions 
on Ig x (0,7). We assume the contact is bilateral (no loss of contact during 
the process) and the friction is modeled with Tresca’s friction law (see, e.g., 
[29, 59]): 


Up = 0, |o,| < 9; 
lo,|<g => u, =0, on Ig x (0,T). (3.116) 
lo;y}-=g => JA>0s4t.0, = -—At, 
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Here, g > 0 represents the friction bound function, u, and u, are the normal 
and tangential components of wu on I’, and a, is the tangential component 
of o. 

To introduce a variational formulation of the problem, we need the space 


V ={ve (A'(2))4: v=O0ae. on Ip, vy =0 ae. on Ic} 


with the inner product and norm defined by 


(u, v)y =| Ce(u) : e(v) dz, llully = J/(a, v)v. 
Q 
Assume that the volume force and traction densities satisfy 
f,€ WP (0,T; (L°(2))*), fg ©WPPO,T; (L7(E))*), (3.117) 


and let the constant g > 0 be given. We define over the space V, 
a(u, v) =| Ce(u) : e(v) da, (3.118) 
Q 
j(v) = g|v,| ds (3.119) 
Ic 
and denote by I(t) the element of V given by 
(U(t), v)v =y, fi(t)- vic f fo(t)-vds VueV, te [0,7]. (3.120) 
Q I'n 


We assume that the initial data satisfy 


uo € V, (3.121) 
a(uo, v) + j(v) > (U0), v)v Vue V. (3.122) 


Then a standard procedure leads to the following variational formulation for 
the frictional contact problem (3.110)—(3.115) and (3.116). 


Problem 3.3. Find u: [0,7] — V such that for a.e. t € (0,7), 
a(u(t), v — iu(t)) + j(v) — j(u(t)) > (I), v— W(t))v Vv EV, (3.123) 


and 
u(0) = uo. (3.124) 


We observe that this is a special case of Problem 3.1. By Theorem 3.11, 
we see that under assumptions (3.117), (3.121), and (3.122), Problem 3.3 has 
a unique solution u € W!°(0,7T;V). 

Some other contact conditions lead to a similar variational inequality. For 
example, suppose the contact condition is described by a simplified version 


3 Finite Element Solution of Variational Inequalities with Applications 85 
of Coulomb’s law (see, e.g., [29, 59]): 
o=S, |o7|< plo), 


lo,|<plo,| > u, =0, on Ig¢ x (0,T). (3.125) 
lo,|;=plo,| > JAS 08.6. 0, = —AU, 


Here S € L®(I¢) is a given function on I'¢ and p € L*(I¢) is the given 
coefficient of friction, uw > 0 a.e. on Ig. Then the variational formulation is 
still of the form (3.123)—(3.124) with the same bilinear form a and 


V ={v€ (A'(2))¢: v =0 ae. on Ip}, 
i(v) =| 1|S| orl ds, 
Ic 


(Ue). ev = f fide + z falt)-vds+ [ Sv, ds. 


Cc 


Analysis of this variational inequality is similar to that for the problem 
(3.123)-(3.124). 

We now show how to derive a posteriori error estimates for finite element 
solutions of the temporally semidiscrete problem. We choose Problem 3.3 as 
the model quasistatic contact problem. Its temporal semidiscrete approxima- 
tion is the following. 


Problem 3.4. Find u*® = {uk}"_), where uk EV, 0<n< N, uf = uo, 
such that for n = 1,2,...,N, 


a(uX,v—d,uk) + 5(v) — j(bnuX) > (In,u—dnuk) VueV, (3.126) 
where l, = U(t,). 


Denote w* = 6,uk. Then uk = u*_, + knw and we can express the 
inequality problem (3.126) in terms of w*: Find w* € V such that 


kna(wy,¥ — wh) + j(v) — j(wy) 


2 (dn, v— wr) = atin 4 


v—w*) VveV. (3.127) 
This inequality is equivalent to the minimization problem: 
wk eV, In (wk) = inf, In(v), (3.128) 
where J, is the functional 
KGS Ho a(, NOAM 4a aw. wee 1Rab9) 


We now turn to a finite element approximation of Problem 3.4. We use the 


finite element space setting discussed in Section 3.7. For a given v € (L1(Q))4, 


86 V. Bostan, W. Han 


similar to (3.10)—(3.11), we define the interpolation operator II, : V > V” 
as follows: 
dee VU Wa dx 


II,v = BS VaLa; where Vg = fe Bada 
Kare 


QENv,0 


(3.130) 
Applying Theorem 3.2, we have an h-independent constant C' > 0 such that 
for all v € V and f € (L7(9))4, 


lv — Th, v)3.9 < Clviz.o, (3.131) 
1/2 


[f@—the) de <Clole( Sha min, IF — Fala |» 


e 


aeNn,o 
(3.132) 
S> hq (w — Tn) llo:% < Clu 1.2) (3.133) 
KEPhp 
So a5? (e — Dav) llo,y < Clolia- (3.134) 
yEEn 


The finite element approximation of the inequality problem (3.126) is to 
find uk* € V" such that 


a(urky* — 6,ur*) + (vu) — 7(dnuh*) > (In, v” — bn ur*) Vor ev". 
(3.135) 
Corresponding to the formulation (3.127), the finite element method is to 
find w?* eV" such that 


kna(wy’, v" — wp") + j(v") — j(wn) 
> (In, v? — wh*) — a(ur®,,u™ — wh*) VureVv". (3.136) 


n nm—1> 


As in previous sections, w* is characterized by the existence of a unique 


AK € L®(Lg) such that 
kna(w*, v) +f gat - v_ ds = (In, v) — a(uk_,, v) VueV, (3.137) 
Ic 
|Anr|SlaeonTco and AL, - wh, =|wh,| ae.on Io; (3.138) 


w'* is characterized by the existence of a unique A"* € L°(I'¢) such that 


kna(wr*, vu”) +f grrk . "ds 
Ico 
= (In, v")—a(ur*,,v0") Vo ev", (3.139) 


AnE| <1 ae. on Ig and ARE whk = |whk| ae. on Ig. (3.140) 


nT nT 
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3.9 A Posteriori Error Estimates for the Quasistatic 
Contact Problem 


We first provide a dual formulation for the problem (3.127). Let 
Q = (L7(Q))°4 x (L?(Ie))*. 


For n = 1,2,...,.N, define F,, : V x Q — R by the formula 


kin 
Fu(voa) =f [SCa, sa, — Sins ¥ + Coluh_.) e(0)] de 


+f glaslas— [ fon: vds, 
Io In 


where q = (4), 92) € Q, fin =f 1 (tn), and f 2, = f (tn). Introduce a linear 
bounded operator A: V > Q by the relation 


Av = (e(v), v;) Vue. 


Then for any v € V, 
Jn(v) = Fy,(v, Av), 


and the minimization problem (3.6) can be rewritten as 
wr ev, F,(w*, Aw*) = inf, F,(v, Av). (3.141) 
ve 
Let V* and Q* = (L?())¢*4 x (L?(Ig))? be the duals of V and Q placed in 


duality by the pairings (-,-)y and (-,-)g, respectively. We need to compute 
the conjugate function F* of Fy: 


Fy (A*q*,-@") = sup {(A*q", v)v — (4, @)Q — Frlv,q)}, 
vEV,qEQ 


where A* : Q* — V* is the adjoint of A. We have 
F(a a) = sup | f [lat - Celuh)) 200) + fine 
vEV,qEQ 2 
Ry 
= een utah dx 


+f Fay vds+ | [5 — (45-2 +a laa)las} 
Tn To 
(3.142) 


Let A: 2 x S¢ — S¢ be the fourth-order tensor, inverse to C. Like C, A is 
also bounded, symmetric, and positive definite in 2. Then we can show that 
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1 * 5 * x 
Fo tg *—q")= | apts nee if q E QF 9 
+00 otherwise, 


where the admissible function set Q} , consists of all g* = (41,92) € Q* such 
that |q5| < g a.e. on Ig and 


[Mai Coluk)) (0) + Fine] de + f Fay vds+ [ q@>: u, ds 
2 In 


Ic 


=0 VvevV. (3.143) 
The dual problem of (3.141) is to find p* € Q}, such that 


—F*(A* p*,—p*) = sup { — F*(A* *-q')}. (3.144) 
qEeQ* , 


Applying Theorem 3.1, we know that both problems (3.141) and (3.144) 
have unique solutions and the following duality relation holds: 


F, (wk, Awk) = —F%(A*p", -p’). (3.145) 


As in Section 3.3, we first let w?* € V be any approximation of w*. By 
using (3.127) and (3.129) we obtain 


ok — awh 2, = Jn (ah) — Jn wh). 
Let p* be the solution of the dual problem (3.144). Relation (3.145) implies 
In (wr) = F (wy, Awy) 
= —F,(4"p",-p") = -Fi(A"@",-@") Vg" € OF >. 
Therefore, for any q* = (qj, q2) € Q},, and r* € CEA (QV en" 


kin 
Flor = wn ll 


<Jn(wht) + FrUate—a) + fs r* rde— f= ——Ar* : r* daz 


Ee 
1 
= | —C(kne(w"*) +Ar*): (kpe(wh* )+ Ar*) dx 
Q 


£ i [Ce(ut_,):e(w*)—f,,- wh] de+ | fon wi ds 
Q 


In 
-{ e(wr*) : rde+ | glwhrlds+ fx — (Agi: qj —Ar* : r*) dz. 
Q To Qkn 
(3.146) 
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It follows immediately from (3.143) that for any q* = (qj, 43) € Q}.4, 


[a ve(v)de+ fi as-veds= f [Celuh a) +€(0) Fin] dx 


— | fo, vds YueV. (3.147) 
In 


Using (3.147) and regrouping terms in (3.146) we find that for any g* € Q} , 
and r* € (L7(Q))?*4, 


ky, 1 
FS lleon — wrk ||? < | ZC lkne( wn") +Ar*): (kne(w"*) + Ar*) dx 
Q ’n 


1 * * * * 

| RAGinn) i= rae 

Q ’n 
J (awit + a5 wht) as. 

r 
Thus, for any r* € (L7(Q))¢4, 

Kin yk hk 2 1 hk * hk 4 
allen — wh ||, < fs 5 Chnle(wn )+ Ar"): (kne(wh”) + Ar*) dx 


1 
+ inf {| —A(qi —1*): (qj —1r*) dx 
2 Kn 


gq EQ? 


+f (g [wrt | + g3- wht) as} . (3.148) 
Ico 
The second term 


1 
T= inf {/ EAMG —):(ai— ears f (al whl + a3 wht) ash 
Q 


Ken = 


on the right-hand side of estimate (3.148) is bounded as follows. First, from 
the definition (3.143), the term IT equals 


. 1 

int sup { f -Aai—e*):(ap—e)ae+ | (allt + a5 wht) ds 
Perea 2 Teo 

qalS 


Here and below, the condition “|q%| < g” stands for “|q5| < g ae. on Ig.” 
Substitute qj — rj by qj and regroup the terms to see that I equals 
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ne sue Ve |-Frce(e) ES gee en mae Cee gee » de 


lgsl<o vev 
fans vds+ | as ords+ [ (g|wre| + qh -w is) ash 
I'n Ic To 


Define the residual 
1 


Ren(g3.9*) = sup 


{ flr Cotuth 1) (0) + Fay 2] de 
Q 
+ fay: vids+ f ai-v, ash. (3.149) 
I'n To 


Then 


kin P 
Ls int sup { —Se olf + (RaCaisr *) + lena — wnallv lolly 


lazl<9 veV 
+f (g|wire| + q5- wit) ds} 
To 
= inf {je Ralai r*)+ [uk = ub sly)? 


lasl<g 
+ (g| wrt | + 45°: wht) as) 
Ico 


Therefore, we have the following result. 


Theorem 3.12. Let w* € V be the unique solution of (3.127), and wh* € V 
an approximation. Then for any r* € (L?(Q))**4, the following error bound 


Kn a 
F len — wrk ||? < | ZC lkne( wn") +Ar*): (kne(we*) + Ar*) dx 
Q hn 


+ int | (Rais 7°) + luk = ue? 


+f (g|wht “| + @q5-w wht) as} 
Tc 


is valid, where the residual Ry(qs,r*) is defined by (3.31). 


Theorem 3.12 provides a general framework for various a posteriori er- 
ror estimates with different choices of the auxiliary variable r*. From now 
on, w'* is the finite element solution defined by (3.136). We choose r* = 
—k,Ce(w'*), and then replace the infimum over g by taking gi = —gA’* 
Then Theorem 3.12 leads to the following corollary. 


Corollary 3.1. Let uk € V be the unique solution of (3.126) and ub® ¢ 
V" its finite element approximation defined by (3.135). Then there exists a 
constant C' such that the following a posteriori error estimate holds: 
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uk — uly SORy + Clu, — ub lly, (3.150) 
where 
_ 1 hk 
Rn = sup 77 _ [Ce(ur, i e(v ee fg’ v| da =. yl on 
vev llvllv 2 
+f gant .v, asl ; (3.151) 
Tc 


Let I), be the interpolation operator defined by (3.130). Substituting v by 
v —II,v in (3.151) we obtain 


ea 2 Tay { I [Ce(uh) : e(v — Tv) — fin: (v—Tno)] de 


Fay (v—Tyv) ds + [ gatk .(v—Tnv)e ash. 
I'n 


Ic 


Decompose the integrals into local contributions from each element K € Pp, 
and apply Green’s formula over K to find 


Ry = sup —— ar S> [i —Div o(u"*) — f,,)-(v — pv) dx 


vEV 


KEPp 
>> als ae atieue 
yEEn,o* T 
+> [ (oe v — fan) + (v — Inv) ds 
YEEn TN 
+ 7 +gAnt)-(u—Tnv), dsp, 
ae (a(u : ") ‘ ° 


(3.152) 


where we used decomposition ov: v = 0, vp +0, ° U7 =O,- U7 ae. on Io. 
Define the interior residuals for each element K € Pp, by 


re = —Divo(u"*) —f,, in K, (3.153) 


and side residuals for each side y € E, by 


[o(un*)vy] if y € Eno, 
R, = ¢ o(un*)yv — fon ify € Enry, (3.154) 
ao(ubk) + oa if y € Enro- 
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Note that residuals corresponding to the sides lying on Ip are considered to 
be 0. By using definitions (3.153) and (3.154), relation (3.152) reduces to 


fre (v — II,v) Jde+ of Ry. (uv — II,v) ds 


KEP, YEEn 
(3.155) 
Using the estimates (3.133) and (3.134) in (3.155), and applying the Cauchy— 
Schwarz inequality, we get 


Rn = sup —— 
vev llullv 


; ; 
Rn < sup —— <C h2-\lr 4 h.\|R 
sup Tae 4 Cllelly( So Akin lc + D> Ballin) 


KEP, VEER 
i 
z 


<C| SO hillrellan + $5 hollRollo.y | - (3.156) 


KEPn YEEn 
We summarize the above results in the form of a theorem. 


Theorem 3.13. Let uk € V and uk* € V" be the unique solutions of (3.126) 
and (3.135), respectively. Then the following a posteriori error estimate holds: 


2 


luk —unklly sO] SO Aj 


KEPp, VEER 


+C llus_i— hk ally, (3.157) 
where r~ and R, are interior and, respectively, side residuals, defined by 


(3.153) and (3.154). 


In practical computations, the terms on the right side of (3.157) are re- 
grouped by writing 
L 
2 
Ju’ — ub lly <o( ra] +Cljuk_,—ubk lv, (8.158) 


KEP, 


where the local error indicator nx,p on each element K,, defined by 


tha=tkiirele+ 5 So ml IBy+ 
YEE(K)NEn,o YEE(K)NEn, rv 
(3.159) 
identifies contributions from each of the elements to the global error. 
Similar to the derivation given in the second half of Section 3.4, we have 
the following inequality on the efficiency of the error estimate: 
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k hk 
So nike SC (acuas ys hy||Anr — Ane Ilo. 


KEPh VEER. Co 


+ S> eelIf in = Fina 0K =F De hyllf on — fon yllo,7 


KEPh YEEn,. TN 


Oe boa a (3.160) 


YEEn. TC 


with discontinuous piecewise polynomial approximations f1,,%; fan,y, and 


Any Of Fins fans and Ane 

We can apply Theorem 3.12 to derive a stress recovery type a posteriori 
error estimate, and we restrict our discussion to linear elements. Similar to 
(3.58)—-(3.59), we define the stress recovery operator G;, : V" > (V")4 as 


follows: 


? 


Gno(v") =a*(v")= S° ova, 4 =a | 
acNn, |Ka| Ka 


In the case of linear elements 
a4 =) a4 (o(v")) xi, (3.162) 
where (a(v")) xi, denotes the tensor value of the stress a(v") on the element 


Kas Ka = Ui Ka, Oa = |Kal/|Kal, i= 1...) Na- 
Let u’* and u*, be the finite element solutions defined by (3.135) at 


the time steps n and n — 1, respectively. Let r* = —o*(uh*) + o(uh*,) and 
replace the infimum over qi by taking qi = —gX"*, where o*(ul*) is the 


recovered stress defined by (3.161) and A”* is provided by Theorem 3.17. 
With these choices, estimate (3.32) becomes 


F lle, — whl < | (o(un®) — o*(un*)) : (e(ur*) — e*(up*)) da 
Q 
2 
+ (Rn + |jut_, ub* lv) 


2 
< Cllo(un") — o* (un )llo.e + (Rn + [lena — Unkally) 


(3.163) 
where e*(u?*) = Ao*(u’*) and the residual R,, is given by 
1 
Ry = sup {| [o* (ur): e(v)-—fi,-v]de— | fo,-vds 
vev |ltllv Lio Tw 
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We can then obtain the following result. 


Theorem 3.14. Let uk € V and uk* € V" be the unique solutions of (3.126) 
and (3.135), respectively. Then the following a posteriori error estimate holds: 


1 
2 2 


lun — Un’ liv so © tee] +C{ D0 ha min, |Ifin-fall.ieg 
KEPhn GaENn,,0 fae 
+Cljup_.— untally, (3.165) 


where local indicators nk,g are computed for every element K € Pp, by 


Nig = llo(un®)—o* (ur lie + Shy llo*(uR*)v — fF anllo,y 
YEE(K)NEn, ry 


+ >> Ayllo*(ub*), + gARFIIG,,- (3.166) 
yEE(K)NEn, re 


We observe that if f ,,, € (Z7())? then 


1/2 
( S> hi Rae If in — Te II5. ‘| = o(h), 


acN, 


and if f ,,, € (H1(2))4, then 


1/2 
(> 4 ng, min, If in —f ol) = O(h?). 


acN;, 


1/2 


Theorem 3.14 asserts that the estimator n¢ = (ker, Nx.q)'!? is a reli- 


able upper bound of a constant multiple of the error ||u* — ul lly. We can 
also show the following inequality: 


k hk 
S> Nea <C Jur — uk IT + a hy||Anr — A, 3 


KEP, YEEnT CO 
a S> hee lIF in _ hy|lf on — f on ll, 
KEPp, YEEn. TN 
+ So hylare — (3.167) 
YEE. oO 


with discontinuous piecewise polynomial approximations fj, %, fon,y, and 


x soy of fins fons and par 


? 
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3.10 Numerical Example on the Quasistatic Contact 
Problem 


We follow the procedure described in Section 3.6 for the finite element ap- 
proximation. 

In order to show the effectiveness of the adaptive procedure we compare 
numerical convergence orders of the approximate solutions. We compute these 
orders by considering families of uniform and locally refined triangulations. 
Consider a sequence of finite element solutions {u?*,,, }_) based on uniform 
triangulations of the domain 2 and the same uniform partition of the time in- 
terval. Starting with an initial coarse triangulation P1, we construct a family 
of nested meshes by the subdivision of each triangle into four triangles. The 
solution from the most refined mesh is taken as the “true” solution {uk }_, 
that is used to compute the errors of the approximate solutions obtained on 
the other meshes. 

The finite element solution tte alo is obtained using the following 
adaptive algorithm: 


1. Start with the initial triangulation P, and corresponding finite element 
subspace V”. 

2. Compute the finite element solution fubk | N41, where ae ev'o< 
n<N. 

3. At the time step ty and for each element K € Pp, compute the error esti- 
mator 7x,1 of residual type (I = R) or gradient recovery type (I = G). 

4. Let 7 = QUKep, nK,1)/Ne, where N- is the total number of elements. An 
element K is marked for refinement if 7x > 7, where p is a prescribed 
threshold. In our example, pu = 1. 

5. Perform refinement and obtain a new triangulation P»,. 

6. Return to step 2. 


Example 3.4. We consider the physical setting shown in Figure 3.35. The 
domain (2 = (0,2) x (0, 10) is the cross-section of a three-dimensional linearly 


LLLLL LLLLL 
a Th Yo t% 
U. 
ahs Q i Fas 
eer " Le 
c 


rigid obstacle 


Fig. 3.35 Setting of the problem. 
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Fig. 3.36 Initial mesh. 


elastic body and plane stress condition is assumed. On the part Ip = (0,10) x 
{2} the body is clamped. Oblique tractions act on the parts {0} x (0,2) and 
{10} x (0,2). Thus I'v = ({0} x (0,2)) U ({10} x (0,2)). The contact part of 
the boundary is I’¢ = (0,10) x {0}. 

The elasticity tensor C satisfies 


Ev 
[To pa (en + €22)di4; + —— 


(Ce)ig = 7 


where F£ is the Young’s modulus, v is the Poisson’s ratio of the material, and 
dij is the Kronecker symbol. We use the following data: 


E = 1000 daN/mm?, vy = 0.3, 

f= (0,0) daN/mm’, 

f (v1, 22,t) = (2.5t,0) daN/mm? if (v1, 22) € {0} x (0,2), 

f (21, £2, t) = (2(x2 — 2)t, -t) daN/mm? if (a1, 22) € {10} x (0,2), 
g =1daN/mm?, uo = Om, T = 1 sec. 


The time step considered is t = 0.2 sec. We start with the initial triangu- 
lation P; (160 elements, 105 nodes) shown in Figure 3.36. Here, the interval 
[0,1] is divided into 1/h equal parts with h = 1/2 which is successively 
halved. The numerical solution corresponding to h = 1/64 is taken as the 
“true” solution {uk }\_). To have an idea of the convergence behaviour of the 
discrete Lagrange multipliers, we compute the errors max» ||A*, — AM |lo.r, 
corresponding to the sequence of uniform refinements. Here, {A*, }\_p is the 
Lagrange multiplier corresponding to the parameter h = 1/64. Figure 3.43 
provides the relative error values max,, || u* — ult”, || and h!/? max, ||An, 
re llo:r¢. The numerical convergence order of h!/? max, [Ae — AP lore is 
obviously higher than that of max, ||uk —uh*,,,||v, indicating that the second 
term in the efficiency bounds (3.160) and (3.167) is expected to be of higher 
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order compared to the first term ||uk — u?*||?,. Graphs of Ap, and Ap,» with 
h = 1/64 are provided in Figures 3.39 and 3.40. 

We use an adaptive procedure based on both residual type and gradi- 
ent recovery type estimates to obtain a sequence of approximate solutions 
{ub j}*_,. The deformed configuration and the adaptive finite element mesh 
after four adaptive iterations are shown in Figure 3.37 (based on the residual 
type estimator, 4792 elements, 2531 nodes) and Figure 3.38 (based on the re- 
covery type estimator, 4894 elements, 2583 nodes). Figures 3.41 and 3.42 con- 
tain the relative error values max,, || uk — ub ally and max, ||uk — uh* ally. 
We observe a substantial improvement of the efficiency using adaptively re- 
fined meshes. Figures 3.44 and 3.45 provide the values of nr = (Te N#.1)!/”, 
I €{R,G}, where nx; are computed using either residual type estimator 
(I = R) or recovery type estimator (I = G) on both uniform and adapted 
meshes. Table 3.6 contains the values of Cy; computed for uniform and adap- 
tive solutions: 


|| — Une liv 


[willy ma 


, LeE{R,G}. 


Cy = max 
n 


Table 3.6 Numerical values of Cr and Cg 


31775 


7.996-02 7-56-02 
2.Te-01 


3.18e-01[2.94e-01]2.92e-01]2.77e-01| 


° 1 2 3 4 5 6 7 8 9 10 


Fig. 3.37 Residual type estimator. Deformed configuration (amplified by 100) at t = 1 sec. 
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° 1 2 3 4 5 6 7 8 9 10 


Fig. 3.38 Recovery type estimator. Deformed configuration (amplified by 100) at t = 
1 sec. 


Fig. 3.40 Plot of AR at t = 1 sec. 
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Fig. 3.41 Residual type estimator. max», ||u* — unk ally (CQ) versus max, ||w® —u 
(A). 
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Fig. 3.42 Recovery type estimator. max, ||wk —uh*,,,||v (C) versus max, ||u% —u 
(A). 
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3.11 Concluding Remarks 


This chapter presents a general framework on a posteriori error estimation 
for finite element solutions of some variational inequalities, through the em- 
ployment of the duality theory in convex analysis. The general error estimate 
contains an auxiliary variable. Different choices of the auxiliary variable leads 
to different a posteriori error bounds; in particular, residual type and recovery 
type error estimates are shown and analyzed in this chapter. The a posteriori 
error estimates are used in developing adaptive finite element algorithms for 
the variational inequalities. The effectiveness of the adaptive finite element 
algorithms is demonstrated in several numerical examples. 

Due to the inequality nature of the problems, efficiency bounds for the 
error estimators contain terms related to approximations of Lagrange multi- 
pliers; see, for example, (3.57) and (3.82). Currently, sharp bounds for such 
terms are unknown. Ideally, it is hoped that terms such as 


S> h||r — alld: 


VEER To 


can be bounded by o0(h?), and then efficiency of the error estimators can 
be rigorously deduced from (3.57) and (3.82). Numerical results reported 
in the chapter strongly suggest the terms involving the Lagrange multiplier 
approximations are of higher order, and performance of the adaptive finite 
element algorithms is not sensitive with respect to the calculation of the 
Lagrange multipliers. 

The derivation and analysis of the a posteriori error estimates are done on 
some model problems in this chapter, and can be extended to other or more 
general variational inequalities, including those arising in frictional contact 
mechanics. Another direction for future research is to develop adaptive al- 
gorithms for simultaneous time and space discretizations of time-dependent 
variational inequalities, and their applications in solving evolutionary prob- 
lems in contact mechanics. 
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Chapter 4 
Time—Frequency Analysis of 
Brain Neurodynamics 


W. Art Chaovalitwongse, W. Suharitdamrong, and P.M. Pardalos 


Summary. The characteristics of neurodynamics of intracranial electroen- 
cephalogram (EEG) at different frequency bands were investigated in a sam- 
ple of two patients with epilepsy. The results indicate a tendency for the 
gamma, theta, and alpha frequency bands in EEG signals to have a higher 
dimensional complexity than the beta and gamma frequency bands. We also 
investigate the time-frequency component decomposition of EEG signals and 
observe very different perceptual complexity and a difference in evoked spec- 
tral responses, which could be a reflection of neuronal recruitment that trig- 
gers the epileptogenic process. The results of this study may provide insights 
to the brain network’s mechanism by which local and regional circuits can 
continuously form and reform with different regions functionally disconnected 
from other brain areas. 


Key words: EEG, brain dynamics, time-frequency distribution, chaos the- 
ory, optimization 


4.1 Introduction 


The electroencephalogram (EEG) measures brainwaves of different frequen- 
cies within specific areas of the brain. Electrodes are implanted and placed 
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on specific sites in the temporal lobe or the scalp to monitor and record the 
electrical impulses and neuronal activities in the brain. The extent of the 
noise effect depends on whether the additive noise caused by the remote elec- 
trodes is sufficient to induce important variations of the dynamics across the 
electrodes. It is well known that electrocortical activity exhibits microchaos 
while macroscopically behaving as a linear, near-equilibrium system. For this 
reason, EEG time series recorded at macroscopic scale may exhibit chaos, and 
it is common for physical phenomena to exhibit different dynamic features 
at different scales. Characterizing EEG signals by using measures of chaos 
derived from the theory of nonlinear dynamics has been proven to accurately 
reflect underlying attractor properties. The study of nonlinear dynamics in- 
volves the statistical investigation of molecular motion, sound waves, the 
weather, and other dynamical systems. As EEG exhibits particular dynamic 
properties at particular scales, which can be called far-from-equilibrium cellu- 
lar and small cellular group interactions, the properties of EEG contrast with 
near-equilibrium dynamics and linear wave superposition at the macroscopic 
level. In this research, we focus on a question of whether the macroscopic 
EEG time series preserves chaotic dynamical features carried through from 
the underlying neural events driving the wave motion and how to apply a 
statistical analysis to study neurodynamics of EEG time series. Before one 
could claim to understand the neurodynamics of cortical neurons, one needs 
to understand how their dynamics differ with scale and how they interact 
across scale associated with the brain electrical activity. In this research, we 
direct our application to the epilepsy research. We investigate the neurody- 
namics of the EEG and its sensitivity to different frequency bands linked to 
epileptic waveforms in the brain of patients with epilepsy. 


4.1.1 The Fact About Epilepsy and EEG 


Epilepsy is the second most common serious brain disorder after stroke. 
Worldwide, at least 40 million people or 1% of the population currently suf- 
fer from epilepsy. Epilepsy is a chronic condition of diverse etiologies with 
the common symptom of spontaneous recurrent seizures, which is character- 
ized by intermittent paroxysmal and highly organized rhythmic neuronal dis- 
charges in the cerebral cortex. Seizures can temporarily disrupt normal brain 
functions such as motor control, responsiveness, and recall, which typically 
last from seconds to a few minutes. There is a localized structural change in 
neuronal circuitry within the cerebrum which produces organized quasirhyth- 
mic discharges in some types of epilepsy (i.e., focal or partial epilepsy). These 
discharges then spread from the region of origin (epileptogenic zone) to acti- 
vate other areas of the cerebral hemisphere. Nonetheless, the mechanism by 
which these fixed disturbances in local circuitry produce intermittent distur- 
bances of brain function is not well comprehended. The development of the 
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epileptic state can be considered as changes in network circuitry of neurons 
in the brain. When neuronal networks are activated, they produce changes in 
voltage potential, which can be captured by EEG. These changes are reflected 
by wriggling lines along the time axis in a typical EEG recording. A typical 
electrode montage for EEG recordings in our study is shown in Figure 4.1. 


(A) (B) 


Fig. 4.1 Inferior transverse views of the brain, illustrating approximate depth and sub- 
dural electrode placement for EEG recordings. Subdural electrode strips are placed over 
the left orbitofrontal (LOF), right orbitofrontal (ROF), left subtemporal (LST), and right 
subtemporal (RST) cortex. Depth electrodes are placed in the left temporal depth (LTD) 
and right temporal depth (RTD) to record hippocampal activity. 


The model for macroscopic neurodynamics depends on the epileptiforms 
in EEG time series (with the presence of noise) at different frequency bands. 
The effect of electrical activities at different frequency bands upon micro- 
scopic neural activity of high chaoticity level of the signal can be used to 
portray the mutual information between macroscopic pools of neurons in the 
brain. As instantaneous natural frequencies, damping factors, and coupling 
coefficients describing the dynamics of small pools of coupled neurons are sto- 
chastically independent, macroscopic neuronal wave motions are assumed to 
obey superposition and position near-equilibrium. This assumption leads to 
the intriguing possibility that local coherence among pools of neuronal cells 
might be generated from subsets of cells currently engaged in high-amplitude 
microscopic chaos, the local pool then providing a driving signal to the linear 
and near-equilibrium macroscopic dynamics. This mechanism appears to offer 
a further means of bridging dynamic interactions across scale. For this rea- 
son, the research is motivated to apply the nonlinear measures based on the 
theory of nonlinear dynamics as they have been proved capable of capturing 
the microscopic and macroscopic chaos in EEG time series. 
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Fig. 4.2 Twenty second EEG recording of the pre-ictal state of a typical epileptic seizure 
obtained from 32 electrodes. Each horizontal trace represents the voltage recorded from 
electrode sites listed in the left column (see Figure 4.1 for anatomical location of electrodes). 
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Fig. 4.3 Twenty second EEG recording of the post-ictal state of a typical epileptic seizure 
obtained from 32 electrodes. Each horizontal trace represents the voltage recorded from 
electrode sites listed in the left column (see Figure 4.1 for anatomical location of electrodes). 


4.1.2 EEG Frequency Bands 


The EEG measures brainwaves of different frequencies within the brain. Elec- 
trodes are placed on specific sites on the scalp to detect and record the elec- 
trical impulses within the brain as shown in Figures 4.1 to 4.3. 
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The raw EEG has usually been described in terms of frequency bands: 
Delta (0.1-3 Hz), Theta (4-8 Hz), Alpha (8-12 Hz), Beta (13-30 Hz), and 
Gamma (above 30 Hz). 


Delta (0.1-3 Hz) 


The EEG Delta waves are less than 4 Hz and occur in deep sleep and in 
some abnormal processes also during experiences of “empathy state.” Delta 
waves are involved with our ability to integrate and release. As it reflects the 
unconscious mind, the information in our unconscious mind can be accessed 
through Delta. It is the dominant rhythm in infants up to one year of age 
and it is present in stages 3 and 4 of sleep. The EEG Delta waves tend to be 
the highest in amplitude and the slowest waves. However, most individuals 
diagnosed with attention deficit disorder, naturally increase rather than de- 
crease Delta activity when trying to focus. The inappropriate Delta response 
often severely restricts the ability to focus and maintain attention. It is as if 
the brain is locked into a perpetual drowsy state [8]. 


Theta (4-8 Hz) 

Theta activity has a frequency of 4 to 8 Hz and is classified as “slow” ac- 
tivity. It can be seen in connection with creativity, intuition, daydreaming, 
and fantasizing and is a repository for memories, emotions, and sensations. 
Theta waves are strong during internal focus, meditation, prayer, and spir- 
itual awareness. It reflects the state between wakefulness and sleep. Theta 
is believed to reflect activity from the limbic system and hippocampal re- 
gions. Theta is observed in anxiety, behavioral activation, and behavioral 
inhibition [8]. 


Alpha (8-12 Hz) 


Alpha waves are those between 8 and 12 Hz. Good healthy alpha production 
promotes mental resourcefulness, aids in the ability to mentally coordinate, 
and enhances an overall sense of relaxation and fatigue. When Alpha pre- 
dominates most people feel at ease and calm. Alpha also appears to bridge 
the conscious to the subconscious. Alpha rhythms are reported to be derived 
from the white matter of the brain. The white matter can be considered the 
part of the brain that connects all parts with each other. Alpha is a common 
state for the brain and occurs whenever a person is alert, but not actively 
processing information. It can also be used as a marker for alertness and 
sleep. Alpha has been linked to extroversion (introverts show less), creativity 
(creative subjects show alpha when listening and coming to a solution for cre- 
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ative problems), and mental work. Alpha is one of the brain’s most important 
frequencies for learning and using information taught in the classroom and 
on the job [8]. 


Beta (13-30 Hz) 


Beta activity is considered to be “fast” activity. It reflects desynchronized 
active brain tissue. It is usually seen on both sides in symmetrical distribution 
and is most evident frontally; however, it may be absent or distorted in areas 
of cortical damage. Beta activity is generally regarded as a normal rhythm 
and is the dominant rhythm in those who are alert or anxious or who have 
their eyes open. It is the state that most of brain is in when we have our 
eyes open and are listening and thinking during analytical problem solving, 
judgment, decision making, or processing information about the world around 
us [8]. 


Gamma (above 30 Hz) 


Gamma activity is the only frequency group found in every part of the brain. 
When the brain needs to simultaneously process information from different 
areas, it is hypothesized that the 40 Hz activity consolidates the required 
areas for simultaneous processing. A good memory is associated with well- 
regulated and efficient 40 Hz activity, whereas a 40 Hz deficiency creates 
learning disabilities [8]. 


4.1.3 Chaos in Brain 


The term “chaos” in the theory of nonlinear dynamics is associated with 
exponential divergence of trajectories in phase space, which can reflect sen- 
sitivity to initial conditions. The evolution of chaos theory over the past two 
decades has mostly dealt with apparently simple systems with few degrees of 
freedom that can exhibit chaotic behavior. The brain is certainly a complex 
high-dimensional system, which has been proven to exhibit high-dimensional 
chaos associated with many brain state variables. The theory of chaos in brain 
appears that phenomena characteristic of many complex nonlinear systems 
(e.g., self-organization, interactions at multiple temporal and spatial scales, 
stable spatial structure in the presence of temporal chaos, etc.) occur in neo- 
cortex and may be closely aligned with cognitive processing. One of the most 
important concepts in the brain theories is that linear or quasilinear phe- 
nomena at one scale can coexist with highly nonlinear phenomena at another 
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scale. This belief lies in communicating several general concepts, which are 
apparently critical to brain dynamic function, to disparate fields. 

A proof of concept about the nonstationarity of EEG time series comes 
from that fact that although locally chaotic EEG dynamics with globally lin- 
ear dynamics does not address evidence for nonstationary temporal signals 
in limit set dimension, in this study we deal with multichannel EEG time 
series and embed the time series in higher dimension, which makes globally 
chaotic EEG to be nonlinear dynamics in spatiotemporal aspects. This con- 
cept suggests that the dynamical properties of EEG time series in our model 
demonstrate significant large-scale temporal and spatial heterogeneity in cor- 
tical function, which can produce macroscopic chaos. Our approaches based 
on the theory of nonlinear dynamics seem to address the evidence for non- 
stationarity of EEG time series as a function of spatiotemporal correlation of 
coupled oscillators. A macroscopic model of EEG should address some issues 
that arise specifically at the transition from locally chaotic dynamics to the 
macroscopic scale. In this study, we consider the system of nonlinear EEG dy- 
namics, including standing waves with frequency in the 10 Hz range (within 
a factor of about two or three), which will give us insights on the increas- 
ing frequency with maturation of the alpha rhythm and negative correlation 
between brain size and alpha frequency during seizure episodes. 

The organization of the succeeding sections of this chapter is as follows. 
The background of previous studies in neocortical dynamics, neurodynamics 
of the brain, and the measures of chaoticity are described in Section 4.2. In 
Section 4.3, the design of experiment in this study is described as well as the 
computational methods to test the hypothesis. The results are described in 
Section 4.4. The conclusions and discussion are addressed in Section 4.5. 


4.2 Neurodynamics of the Brain 


In the last decade, time series analysis based on chaos theory and the theory 
of nonlinear dynamics, which are among the most interesting and growing 
research topics, has been applied to time series data with some degree of suc- 
cess. The concepts of chaos theory and the theory of nonlinear dynamics have 
not only been useful to analyze specific systems of ordinary differential equa- 
tions or iterated maps, but have also offered new techniques for time series 
analysis. Moreover, a variety of experiments has shown that a recorded time 
series is driven by a deterministic dynamical system with a low-dimensional 
chaotic attractor, which is defined as the phase space point or set of points 
representing the various possible steady-state conditions of a system, an equi- 
librium state or group of states to which a dynamical system converges. Thus, 
the theories of chaos and nonlinear dynamics have provided new theoretical 
and conceptual tools that allow us to capture, understand, and link the com- 
plex behaviors of simple systems. Characterization and quantification of the 
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dynamics of nonlinear time series are also important steps toward under- 
standing the nature of random behavior and may enable us to predict the 
occurrences of some specific events that follow temporal dynamical patterns 
in the time series. 

Several quantitative system approaches incorporating statistical technique 
nonlinear methods based on chaos theory have been successfully used to 
study epilepsy because the aperiodic and unstable behavior of the epilep- 
tic brain is suitable to nonlinear techniques that allow precise tracking of 
the temporal evolution. Our previous studies have shown that seizures are 
deterministic rather than random. Consequently, studies of the spatiotem- 
poral dynamics in long-term intracranial EEGs, from patients with tem- 
poral lobe epilepsy, demonstrated the predictability of epileptic seizures; 
that is, seizures develop minutes to hours before clinical onset. The pe- 
riod of a seizure’s development is called a pre-ictal transition period, which 
is characterized by gradual dynamical changes in EEG signals of critical 
electrode sites approximately 1/2 to 1 hour duration before the ictal on- 
set [4, 22, 17, 20, 14, 18, 15, 26, 29, 34, 33]. During a pre-ictal transition 
period, gradual dynamical changes can be exposed by a progressive conver- 
gence (entrainment) of dynamical measures (e.g., short-term maximum Lya- 
punov exponents, ST LZynax) at specific anatomical areas and cortical sites, in 
the neocortex and hippocampus. Another measure we have used in the state 
space created from the EEG at individual electrode sites in the brain, average 
angular frequency (2), has produced promising results too. The value of 2 
quantifies the average rate of the temporal change in the state of a system 
and is measured in rads/sec. Although the existence of the pre-ictal transition 
period has recently been confirmed and further defined by other investigators 
[9, 10, 23, 31, 24, 29], the characterization of this spatiotemporal transition 
is still far from complete. For instance, even in the same patient, a different 
set of cortical sites may exhibit a pre-ictal transition from one seizure to 
the next. In addition, this convergence of the normal sites with the epilep- 
togenic focus (critical cortical sites) is reset after each seizure [4, 21, 18, 29]. 
Therefore, complete or partial post-ictal resetting of pre-ictal transition of 
the epileptic brain, affects the route to the subsequent seizure, contributing 
to the apparently nonstationary nature of the entrainment process. In those 
studies, however, the critical site selections are not trivial but extremely im- 
portant because most groups of brain sites are irrelevant to the occurrences 
of the seizures and only certain groups of sites have dynamical convergence 
in the pre-ictal transition. 

Because the brain is a nonstationary system, algorithms used to estimate 
measures of the brain dynamics should be capable of automatically identify- 
ing and appropriately weighing existing transients in the data. In a chaotic 
system, orbits originating from similar initial conditions (nearby points in 
the state space) diverge exponentially (expansion process). The rate of diver- 
gence is an important aspect of the system dynamics and is reflected in the 
value of Lyapunov exponents and dynamical phase. Estimates of Lyapunov 
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exponents and phase velocity of EEG time series are shown to be consistent 
with a global theory of EEG in which waves are partly transmitted along cor- 
ticocortical fibers in the brain. General features of neocortical dynamics and 
their implications for theoretical descriptions are considered. Such features 
include multiple scales of interaction, multiple connection lengths, local and 
global time constants, dominance of collective interactions at most scales, and 
periodic boundary conditions. Epileptiforms and other epileptic activities in 
EEG time series may be directly related to the continuous forming and re- 
forming of local and regional circuits that are functionally disconnected from 
tissue involved in global operation. 


4.2.1 Estimation of Lyapunov Exponents 


The method we developed for estimation of short term largest Lyapunov 
exponents (ST LZynax), an estimate of Zinax for nonstationary data, is explained 
in detail elsewhere [13, 16, 36]. Herein we present only a short description of 
our method. Construction of the embedding phase space from a data segment 
x(t) of duration T is made with the method of delays. The vectors X; in the 
phase space (see Figure 4.4) are constructed as 


X; = (x(t;), a(t; +7)... a(t; + (p— 1) *7)), (4.1) 


Fig. 4.4 Diagram illustrating the estimation of ST Zmax measures in the state space. The 
fiducial trajectory, the first three local Lyapunov exponents (Lj, L2, L3), is shown. 
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where T is the selected time lag between the components of each vector in 
the phase space, p is the selected dimension of the embedding phase space, 
and ¢; € [1,T — (p — 1)r]. If we denote by L the estimate of the short-term 
largest Lyapunov exponent ST Lynax then: 


Na 


1 |0.X;,;(At)| 
—————— l pleas Beemer A A 
Nt d, °82 15X50) ee) 
with 
6X;i,3(0) = X (ti) — X(t), (4.3) 
5X;,;(At) = X(t; + At) — X(t; + Ad), (4.4) 
where 


e X(t;) is the point of the fiducial trajectory ¢:(X(to)) with t = t, 
X (to) = (x(to),..-,2(to+(p—1)*7)), and X(t;) is a properly chosen vector 
adjacent to X(t;) in the phase space (see below). 

e 6X;,;(0) = X(t;)—X(t;) is the displacement vector at t;, that is, a pertur- 
bation of the fiducial orbit at t;, and 6.X;,;(At) = X(t; + At) — X(t; + At) 
is the evolution of this perturbation after time At. 

e t; =to+(¢—1) x At and t; = to + (j — 1) * At, where 7 € [1, Na] and 
jE [1,N] with 7 Ai. 

e At is the evolution time for 6X;,;, that is, the time one allows 6.X;,; to 
evolve in the phase space. If the evolution time At is given in sec, then L 
is in bits per second. 

e to is the initial time point of the fiducial trajectory and coincides with 
the time point of the first data in the data segment of analysis. In the 
estimation of L, for a complete scan of the attractor, to should move within 
[0, At]. 

e N, is the number of local Dyas estimated within a duration T data seg- 
ment. Therefore, if D; is the sampling period of the time domain data, 
T =(N-1)D, = N,At + (p— 1)r. 


We computed the ST Linax profiles using the method proposed by Iasemidis 
and coworkers [13], which is a modification of the method by Wolf et al. [36]. 
We call the measure short term to distinguish it from those used to study au- 
tonomous dynamical systems studies. Modification of Wolf’s algorithm is nec- 
essary to better estimate ST Zax in small data segments that include tran- 
sients, such as inter-ictal spikes. The modification is primarily in the search 
procedure for a replacement vector at each point of a fiducial trajectory. For 
example, in our analysis of the EEG, we found that the crucial parameter of 
the Lmax estimation procedure, in order to distinguish between the pre-ictal, 
the ictal, and the post-ictal stages, was not the evolution time At nor the 
angular separation V;,; between the evolved displacement vector 5.X;_1,;(At) 
and the candidate displacement vector 6X;,;(0) (as was claimed in Frank 
et al. [11]). The crucial parameter is the adaptive estimation in time and 
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Fig. 4.5 Smoothed ST Imax profiles over 2 hours derived from an EEG signal recorded 
at RTD2 (patient 1). A seizure (SZ { 10) started and ended between the two vertical 
dashed lines. The estimation of the Lmax values was made by dividing the signal into 
nonoverlapping segments of 10.24 sec each, using p = 7 and t = 20 msec for the phase 
space reconstruction. The smoothing was performed by a 10 point (1.6 min) moving average 
window over the generated ST Imax profiles. 


phase space of the magnitude bounds of the candidate displacement vector 
to avoid catastrophic replacements. Results from simulation data of known 
attractors have shown the improvement in the estimates of L achieved by us- 
ing the proposed modifications [13]. In the pre-ictal state, depicted in Figure 
4.5, one can see a trend of STZmax toward lower values over the whole pre- 
ictal period, with one prominent drop in the value of ST Lyyax approximately 
24 minutes prior to the seizure (denoted by an asterisk in the figure). This 
pre-ictal drop in ST Zax can be explained as an attempt of the system to- 
ward a new state of less degrees of freedom long before the actual seizure [17]. 


4.2.2 Estimation of EEG Phase Velocity 


Motivated by the representation of a state as a vector in the state space, an- 
other chaoticity measure employed in this research is a use of frequency-wave 
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spectra to obtain phase velocity estimates. To estimate the phase velocity, we 
define the difference in phase between two evolved states X(t;) and X (t;+At) 
as A®,; [19]. Then, denoting with (A®) the average of the local phase differ- 
ences A®; between the vectors in the state space, we have: 


Ab = — -S° AG; (4.5) 


where Ng is the total number of phase differences estimated from the evolu- 
tion of X(t;) to X(t; + At) in the state space, according to: 


X (ti) - X(t; + At) 
A®@, = |arccos ————___~__ (4.6) 
IX Ca)Il- rm (t; + At)|| | 
Then, the average angular frequency 92 is: 
N= —- Ag. (4.7) 


a 


If At is given in sec, then 2 is given in rad/sec. Thus, whereas ST Lynax 
measures the local stability of the state of the system on average, (2 measures 
how fast a local state of the system changes on average (e.g., dividing 2 by 
27, the rate of change of the state of the system is expressed in sec”! = Hz). 

An example of a typical Q profile over time is given in Figure 4.6. The 
values are estimated from a 60 minute long EEG sample recorded from an 
electrode located in the epileptogenic hippocampus. The EEG sample in- 
cludes a 2 minute seizure that occurs in the middle of the recording. The 
state space was reconstructed from sequential, nonoverlapping EEG data 
segments of 2048 points (sampling frequency 200 Hz, hence each segment 
of 10.24 sec in duration) with p = 7 and 7 = 4, as for the estimation of 
STLIimax profiles [19]. The pre-ictal, ictal, and post-ictal states correspond 
to medium, high, and lower values of 2, respectively. The highest Q values 
were observed during the ictal period, ve higher 2 values were observed 
during the pre-ictal period than during the post-ictal period. This pattern 
roughly corresponds to the typical observation of higher frequencies in the 
original EEG signal ictally, and lower EEG frequencies post-ictally. However, 
these observations can hardly denote a long-term warning of an impending 
seizure. The estimates of phase velocity of EEG data exhibit evidence for 
an underlying characteristic velocity of the EEG time series along the corti- 
cal surface. These results support the general theoretical idea that an EEG 
is composed of traveling waves, which are partly combined to form stand- 
ing waves and propagated along corticocortical fibers. However, the results 
demonstrate that the existence of waves at such large scales does not pre- 
clude the simultaneous existence of waves at several smaller scales in which 
propagation can be adequately described in terms of exclusively intracortical 
interactions. 
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Fig. 4.6 A typical 2 profile before, during and after an epileptic seizure, estimated from 
the EEG recorded from a site in the epileptogenic hippocampus; the seizure occurred 
between the vertical lines. 


4.2.8 Spatiotemporal Dynamics 


Based on the estimated STLinax and 2 profiles at individual cortical sites, 
the temporal evolution of the stability of each cortical site is quantified. 
However, the system under consideration (brain) has a spatial extent and, 
as such, information about the transition of the system towards the ictal 
state should also be included in the interactions of its spatial components. 
The spatial dynamics of this transition are captured by consideration of the 
relations of the STLmax (and 2) between different cortical sites. For ex- 
ample, if a similar transition occurs at different cortical sites, the ST Imax 
of the involved sites are expected to converge to similar values prior to the 
transition. We have called such participating sites “critical electrode sites.” 
We have used periods of 10 minutes (ie., moving windows including ap- 
proximately 60 STZyax values over time at each electrode site) to test the 
convergence at the 0.01 statistical significance level. We employed the T- 
index (from the well-known paired T-statistics for comparisons of means) 
as a measure of distance between the mean values of pairs of STLmax pro- 
files over time. The T-index at time t between electrode sites 7 and 7 is de- 
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fined as 
T,,;(t) = VN x |E{STLinax,i — ST Linax,j}| /%,3(€), (4.8) 


where F{-} is the sample average difference for the STLymax,; — STLmax,; 
estimated over a moving window w;(A) defined as 


fl. a xer=n-i4, 
wld) = 4 if \ ¢ [tN — 1,4], 


where N is the length of the moving window. Then, 0;,;(t) is the sample 
standard deviation of the STLy,,, differences between electrode sites 7 and 
j within the moving window w;(A). The thus defined T-index follows a t- 
distribution with N — 1 degrees of freedom. For the estimation of the T;,;(t) 
indices in our data we used N = 60 (i.e., average of 60 differences of ST Lynax 
exponents between sites i and 7 per moving window of approximately 10 
minute duration). Therefore, a two-sided t-test with N — 1 (= 59) degrees of 
freedom, at a statistical significance level a should be used to test the null 
hypothesis, H,: “brain sites 7 and 7 acquire identical ST Lyax values at time 
t.” In this experiment, we set a = 0.01, the probability of a type I error, 
or better, the probability of falsely rejecting H, if H, is true, is 1%. For 
the T-index to pass this test, the T;,;(t) value should be within the interval 
(0,2.662]. 


4.2.4 Optimization in the Brain Neurodynamics 


Having quantified the spatiotemporal dynamics of the brain, we propose op- 
timization techniques to identify the critical electrode sites. This problem can 
be naturally modeled as a quadratic 0-1 program, which has been extensively 
used to study Ising spin glass models [1, 2, 3, 12, 25]. Specifically, the critical 
electrode selection problem is formulated as a quadratic 0-1 knapsack prob- 
lem with the objective function to minimize the average T-index (a measure 
of statistical distance between the mean values of ST Zinax) among electrode 
sites and the knapsack constraint to identify the number of critical cortical 
sites. The problem is formally formulated as follows. 

Let A be an n x n matrix, whose each element a;,; represents the T- 
index between electrode 7 and j within a 10 minute window before the onset 
of a seizure. Define x = (21,...,%n), where each x; represents the cortical 
electrode site 7. If the cortical site i is selected to be one of the critical 
electrode sites, then x; = 1; otherwise, x; = 0. 

A quadratic function is defined on R” by 


min f(z) =a? Az, st. 2; € {0,1},i=1,...,n, (4.9) 
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where A is an n X n matrix [27, 28]. Next, we add a linear constraint, 
yoy vi = k, where k is the number of critical electrode sites that we want 
to select. We now consider the following linearly constrained quadratic 0-1 
problem: 


P:minf(z)=a7 Aa, st. a, =k for some k,x € {0,1}", A€ R"*”. 


i=l 


Problem P can be formulated as a quadratic 0-1 problem of the form as in 
(4.9) by using an exact penalty. If A = (a;;) then let 


M=2|S7 30 |ais|| +1. 


n n 
j=l i=l 


Then, we have the following equivalent problem P. 


a. 2 
P:ming(x) = 2? Ar+M pas - ) , st. x2 € {0,1}",Ae R*”. 


i=1 


Such a problem can be solved by applying a branch and bound algorithm 
with a dynamic rule for fixing variables [27, 28]. 

Previous studies by our group have shown the existence of resetting of the 
brain after seizure onset [35, 21, 32], that is, divergence of STLyax profiles 
after seizures. Therefore, to ensure that the optimal group of critical sites 
shows this divergence, we reformulate this optimization problem by adding 
one more quadratic constraint. The quadratically constrained quadratic zero— 
one problem is given by: 


min x? Ax (4.10) 
Sic seo (4.11) 
a? Bs > Tyk(k—1), (4.12) 


where x; € {0,1} Vi € {1,...,n}. 

Note that the matrix B = (b;;) is the T-index matrix of brain sites i and j 
within 10 minute windows after the onset of a seizure. T,, is the critical value 
of the T-index, as previously defined, to reject H,: “two brain sites acquire 
identical Q values within time window w;(A).” 

With one more quadratic constraint, the quadratic 0-1 problem becomes 
much harder to solve. Note that in the approach, a branch and bound al- 
gorithm with a dynamic rule for fixing variables cannot be applied to solve 
this problem because of the additional quadratic constraint [27, 28]. A con- 
ventional linearization has been proposed to solve this problem by intro- 
ducing a new variable for each product of two variables and adding some 
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additional constraints, and then formulating this problem as a mixed-integer 
linear (MILP) problem. Specifically, for each product 2;2;, we introduce a 
new 0-1 variable, x;; = 2;x; (i 4 j). Note that xj; = x? = x; for x; € {0, 1}. 
The equivalent MILP problem is given by 


min SS Casa; 4.13) 
tj 

s.t. Soi =k, 4.14) 

t=1 

Lig SVM, for i,j =1,....n(i#7) 4.15) 

Lig LX, for i,j =1,...,n(i#7) 4.16) 

Li+xzj-1< 2;, for i,j =1,...,n (@ #7) 4.17) 

2D biti STM k= 1), 4.18) 


where x; € {0,1} andO< aj <1,1,7 =1,...,n 

Although the above-mentioned linearization technique can be used to solve 
the quadratically constrained quadratic integer program, a better lineariza- 
tion technique has been proposed [5]. The reason is that the above formula- 
tion is computationally inefficient as n increases. A more efficient MILP for 
electrode selection problem proposed in [5, 26] is given by 


min) °s; 4.19) 
I=1 
st. 5 2i-k=0 4.20) 
—) aya; + 9, +4; = 0, fori=1,...,n 4.21) 
j=l 
yi -M(1-a,) <0, fori=1,...,n 4,22) 
h;— M's, <0, fori=1,...,n 4.23) 
—)o dyay+hy<0, foré=1,...5 4.24) 
j=l 

So hi > Tak(k — 1), 4.25) 

t=1 


where x; € {0,1} and s;,y;,h; > 0, for i,j =1,...,n, and M’ = ||Al|.. and 
M = ||Blloo. 
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4.3 Design of Experiments 


In this study, we investigate the properties of the brain neurodynamics from 
the continuous long-term multichannel intracranial EEG recordings that had 
been acquired from two patients with medically intractable temporal lobe 
epilepsy. The recordings were obtained as part of a pre-surgical clinical eval- 
uation. They had been obtained using Nicolet BMSI 4000 and 5000 recording 
systems, using a 0.1 Hz high-pass and a 70 Hz low-pass filter. Each record 
included a total of 28 to 32 intracranial electrodes (8 subdural and 6 hip- 
pocampal depth electrodes for each cerebral hemisphere). 

In this framework, we filter the EEG data and then estimate STL,,,, and 
phase velocity, which measure the order or disorder of EEG signals recorded 
from individual electrode sites. Because the EEG consists of several frequency 
components (as described in the introduction) embedded in a single time 
series, bandpass filters are used to extract a particular frequency band from 
the EEG. In this experiment Butterworth filters are used to filter specified 
frequency bands (Delta, Theta, Alpha, Beta, and Gamma). In this study, 
we use the 10th order of Butterworth filters for each frequency band. Filter 
parameters were generated by using the butter command in MATLAB. Then 
the EEG from each channel was passed to each filter as shown in Figure 4.7. 

In these experiments, we aim to investigate if the mechanism of epilepto- 
genesis implies a specific function for each frequency band response. Stimuli 
by epileptogenesis processes are likely to elicit dynamical changes in intracor- 
tical interactions. Such stimulation might trigger neurodynamical processes 
and we investigate whether additional processes will follow. 
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Fig. 4.7 Filter design of the proposed analysis for each frequency band. 
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4.3.1 Hypothesis I 


To determine whether synchronized and/or repetitive activity of neurons 
serves a specific epileptogenesis process, one must compare brain responses 
in two states (normal and abnormal) only one of which induces changes in 
neurodynamics. In this study, we speculate that the chaoticity level of dif- 
ferent frequency bands of EEG data moving in the same direction lead to 
synchronous and fast oscillatory activity of neurons activated by the epilep- 
togenesis stimuli. We hypothesize that filtering EEG data by using a different 
cutoff frequency might give different effects/synchronous and fast oscillatory 
activity of the EEG, which can be reflected from the average value of ST Lax 
profiles. To test this hypothesis, we have to implement the narrowband low- 
pass filter (LPF) using an IIR filter. Note that designing this filter is a very 
difficult task (but very straightforward to implement). One of the reasons 
is that it is extremely hard for a filter to get such a sharp response. An- 
other reason is that a perfectly linear phase (constant group delay) cannot 
be realized using IIR filtering. There are several approaches to designing an 
approximately linear phase IIR filter. A more widely used approach is to it- 
eratively design filters that simultaneously minimize errors in magnitude and 
group delay. Another widely used approach is to design an IIR filter that ap- 
proximates the desired magnitude response (e.g., an elliptic filter) and then 
design an IIR all-pass filter which compensates for the nonlinear phase. 

In this study, we implement IIR filtering procedures using the FD toolbox 
in MATLAB, which makes it possible to design a very good LPF magnitude 
response using an elliptic filter. This technique allows us to keep the poles 
within a circle of a specified radius. This approach employs a least-pth algo- 
rithm that attempts to minimize an Ly-norm error. The signal’s magnitude 
response of the L,-norm error is given by: 


Hl, = 5 | "| H(w) — Halo)? W(w) do? (4.26) 


POF bit 
where H is the actual frequency response, Hg is the desired response, and W 
is some weighting function. In practice, the weighting function is normally 
equal to 1 over the passband and stopband and 0 in the transition band. 
Note that minimizing the L2-norm is equivalent to minimizing the root mean 
square (RMS) error in the magnitude. In contrast, the L,.-norm is equivalent 
to minimizing the maximum error over the frequencies of interest. Once the 
magnitude response in our experiment has been set, we need to perform 
group delay equalization to yield approximately constant group delay using 
a least-pth algorithm to constrain the radius of the poles. 

To construct a LPF, we employ a frequency response function of the But- 
terworth LPF. This type of filter is especially useful because the random 
errors involved in the raw position EEG data obtained through reconstruc- 
tion are characterized by relatively high frequency contents. The Butterworth 
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LPF can be expressed by 
1 


H.(jw)/? = ———_., 
| Ze(jw)| 1+ Gu | juny 


(4.27) 


where j = /—1, w is the frequency (rad/s), w. is the cutoff frequency 
(rad/s), and n = the order of the filter. When w = 0, the magnitude-squared 
function (H2) equals 1 and the frequency component is completely passed. 
When w = oo , H? equals 0 and the frequency component is completely 
stopped. Between the passband and the stopband, there is the transition 
band (0 < H? < 1) in which the frequency component will be partially 
passed but partially stopped at the same time. When w = u,, H? always 
equals 0.5 (half-power) regardless of the order of the filter. The Butterworth 
LPF is usually represented by the transfer function of its normalized form, 
|H(jw)|? = 1/ (1 +w?"). The frequency response function of the Butterworth 
filter involves complex numbers (jw). Thus, the magnitude-squared function 
is the product of the response function pairs H,(jw) and H,.(—jw) given by 


1 
H.(jw)|? = H-(jw) - H-(—jw) = ———_———. 
JH. (ju))? = Helio) He 30) = ee 
The Butterworth LPF gives transfer functions that are rational functions but 
finding its roots results in a transfer function of the form: 


1 1 


(8 — 81)($— 82)---(8— Sn) Sn + Gn_18"-1 +---+ars+1? 


where a; is the root of 1+ (—1)"s?” = 0. 


4.3.2 Hypothesis IT 


Before the ictal period, there are frequency components that have frequency 
responses that can be used to verify that the brain is fundamentally parallel. 
In this study, neuropsychological and neuroanatomical evidence might sug- 
gest that both its function and its organization are modular. To what extent 
can parallels be drawn between modularity of epileptogenesis processes as ex- 
emplified by cerebral localization of function which might be revealed by an 
epileptogenic zone? In addition, we speculate that this phenomenon might be 
a result of quadratic phase couplings. We hypothesize that this phenomenon is 
a result of quadratic phase couplings, which can be manifested through the bi- 
spectrum of the EEG signal by decomposing the Wigner—Ville time-frequency 
components of the EEG signal. The frequency components appearing in the 
spectrogram should form a nonperfect sine wave that creates a higher-order 
harmonic of the signals. By employing Choi-Williams time-frequency dis- 
tribution, we should be able to capture the frequency responses of the EEG 
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that show only the main component of EEG signals (not the harmonic compo- 
nent), which is speculated to be reflected from the epileptogenesis processes. 


Wigner—Ville Distribution 


The Wigner—Ville distribution (WVD) is employed to capture the tempo- 
ral development of epileptogenesis, which might be reflected from the time- 
dependent variation in amplitude changes of each frequency band. The WVD 
is defined as follows: 


Wf) => He h (¢ = =) h* (¢ ES =) e2tifT dr, (4.28) 


where h(t) is the time series data [7]. In practice, we normally use the dis- 
crete analogue of the previous equation, which is represented by Wj, = 
alae hy—e2) hy+es2) e™*/N. The WVD is the most suitable and 
promising time-frequency distribution map for our study because it satisfies 
a large number of desirable mathematical properties including: (a) energy 
conservation (the energy of h can be obtained by integrating the WVD of 
h all over the time-frequency), (b) marginal properties (the energy spectral 
density and the instantaneous power can be obtained as marginal distribu- 
tions of W(t, f)), and (c) compatibility with filterings (the WVD expresses 
the fact that if a signal y is the convolution of x and z, the WVD of y is the 
time-convolution between the WVD of z and the WVD of zx). 


Choi—Williams Distribution 


The Choi—Williams distribution (CWD) expresses the EEG time series from 
the spectral density point of view. The CWD, which is a time-frequency 
distribution of Cohen’s class, introduces an exponential function of the time 
series (also called exponential distribution). The exponential kernel is used to 
control the cross-terms as represented in the generalized ambiguity function 
domain. It is also considered to be the Fourier transform of the time-indexed 
autocorrelation function K(t,7) estimated at a given time t. The CWD is 

given by ee 
Ci Aee: i ett K(t,r) dr, (4.29) 

2h Jae 


where 
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where o is a factor controlling the suppression of cross-terms and the fre- 
quency resolution. Note that C(t, f) becomes the Wigner—Ville distribution 
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when o — ov. It also satisfies the marginal conditions for all the values of 
a. To give a reasonable accuracy to the estimate K(t,7), the range of time 
average should be increased for the large value of 7 [6]. It is worth mentioning 
that the Choi—Williams transform is quite expensive in terms of computa- 
tional complexity. 


4.4 Results 


Based on the experimental design, the results of these hypothesis testings 
will give more insights in complex physical components in different frequency 
bands of EEG signals, which are essential features of neocortical dynamics at 
multiple scales of interaction with different frequency bands and dominance 
of collective interactions at most scales, and periodic boundary conditions. 
In addition, the time-frequency component decomposition may provide in- 
sights to the brain network’s mechanism by which local and regional circuits 
can continuously form and reform with different regions functionally discon- 
nected from other tissue (a form of self-organization). In the neurodynam- 
ics theory, the switching between more local and more global operation in 
time-frequency components is governed by local and global control param- 
eters speculated to change due to the influences of various neuromodulators 
in epileptogenic processes. The following two sections explain the results of 
hypothesis testings and resultant observations in this study. 


4.4.1 Hypothesis I 


Figure 4.8 illustrates STL, profiles of original EEG data and EEG data 
after passing a lowpass filter (LPF) of 10, 20, 30, and 50 Hz. Obviously, the 
brain activity at Delta and Theta bands (under 10 Hz) is more chaotic than 
the activity at higher frequency bands. However, during a seizure, the drop 
in the chaoticity at Delta and Theta bands is much less prominent than that 
of higher frequency bands. After a seizure, we observe an activity at the high 
Gamma band (over 50 Hz), which makes the EEG signal become less chaotic 
(more ordered) 10 minutes after a seizure. It is worth noting that there is 
no significant activity in the EEG data between 20-50 Hz, which considered 
to be in high Beta and low Gamma bands. This result verifies the theory 
used to categorize frequency bands in EEG data based on neuroanatomical 
connections in the brain at higher frequency bands. If cortical synchrony of 
EEG data at different frequency bands were an indicator of epileptogenesis 
processes, it could be expected that ongoing activity between 10-50 Hz bands 
would invoke more complex integration processes during the seizure, which 
is reflected by more pronounced synchronized activity. 
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Fig. 4.8 STLmax profiles of EEG data from Electrode 1: Unfiltered and after passing 
LPF 10, 20, 30, 40, and 50 Hz. 


Based on the implication of our previous observation, the ST Imax profile 
of filtered (LPF-10) appears to contain characteristics different from those at 
higher frequency bands. Also we observed that the EEG activity at 20-50 Hz 
bands does not contain pronounced characteristics associated with epilepto- 
genic processes. Figure 4.9 illustrates STDinax profiles of original EEG data 
and EEG data from Electrode 1 (LTD1) after passing lowpass filter of 10 
and 50 Hz, or bandpass filter (BPF) of 10 and 20 Hz. Surprisingly, the EEG 
activity at the frequency range of 10-20 Hz is much less chaotic than that 
at other frequency bands and the EEG activity at the frequency range lower 
than 10 Hz is much more chaotic than that at other frequency bands. This 
implies that EEG activity at Alpha and low Beta bands is much less chaotic 
than EEG data at all frequency bands. In addition, EEG activity at the 
Delta and Theta bands is much more chaotic than EEG data at all fre- 
quency bands. This phenomenon can be explained by the fact that more and 
less complex manipulative activity is also correlated with different patterns 
of synchronized oscillatory brain activity in different frequency ranges and 
synchronized Delta—Theta activity can be recorded from the motor and so- 
matosensory cortices of the brain. Furthermore, Delta activity is related with 
attention deficit disorder which could occur in patients with epilepsy as they 
sometimes lose their ability to focus or maintain attention. Figure 4.10 illus- 
trates STLmax profiles of original EEG data and EEG data from Electrode 
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Fig. 4.9 STLImax profiles of EEG data from Electrode 1 after passing LPF 10 Hz, BPF 
10-20 Hz, and LPF 50 Hz. 
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Fig. 4.10 Electrode 6 frequency band using LPF 10 BPF10—20 LPF 50. 
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Fig. 4.11 Sweep frequency and ST'Imax profile of EEG data. 


6 (RTD2) after passing a lowpass filter of 10 and 50 Hz, or bandpass filter of 
10 and 20 Hz. Note that Figure 4.10 shows consistent findings as illustrated 
in Figure 4.9, which confirms our hypothesis. 

We also observed a drop of ST Imax profiles of original EEG about 20 
minutes after the seizure. This drop cannot be observed if the EEG data 
are filtered at LPF 50 Hz, which is postulated to be the enhancement of 
gamma-band responses over 50 Hz, which might be attributable to the level 
of sensorimotor integration required by complex movements or activity of 
the brain to recover from seizure episodes. These results are also consistent 
with the ones confirmed by an MEG investigation of human brain responses 
during the brain recovery period. In addition, the complexity of the brain 
activity could also be critical for the gamma-band response to occur as EEG 
recordings normally are used to investigate changes of gamma-band activ- 
ity associated with tasks such as verbal and visual-spatial problem solving. 
Although these results are consistent with the assumption that cognitive 
processes, that is, selective attention to sensorimotor integration, underlie 
stronger gamma-band responses, they might also arise from the complexity 
of the movement to be performed. If no differences occur in these bands, the 
changes of the brain activity during seizure episodes cannot be the cause of an 
effect visible in lower frequencies. Figure 4.11 illustrates the sweep frequency 
and STLymax profile of EEG data. 


4.4.2 Hypothesis IT 


An example of 5 second EEG data before a seizure onset analyzed in this 
study for hypothesis II and its spectrogram are illustrated in Figure 4.12. We 
postulate that before the ictal period, there are frequency components that 
demonstrate neuropsychological and neuroanatomical evidence of the modu- 
larity of the brain network. In addition, epileptogenic processes are believed to 
be a result of quadratic phase couplings in the epileptogenic zone. Figure 4.13 
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Fig. 4.12 An example of 5 sec data of EEG before a seizure onset and its spectrogram. 


illustrates the bispectrum of the pre-ictal EEG signal by decomposing the 
Wigner time-frequency components. As shown, the decomposed frequency 
components form an imperfect sine wave, which is a result of higher-order 
harmonic of the pre-ictal EEG signal. In Figure 4.14, we employ the Choi— 
Williams time-frequency distribution to capture the main component of the 
frequency responses of the pre-ictal EEG signal, which can reflect a formation 
of the epileptogenic process. 

In this study, we have demonstrated that epileptogenic processes incur 
specific changes in high-frequency brain responses when two time-frequency 
components are compared. As observed in Figures 4.13 and 4.14, the time— 
frequency components have very different perceptual complexity and a dif- 
ference in evoked spectral responses, which could be a reflection of neuronal 
recruitment that triggers the epileptogenic process. On the cortical level, the 
epileptogenesis should lead to cell assembly ignition, which gamma-band re- 
sponses to the post-ictal period should be stronger than responses to the 
other stages (see Figure 4.8). 


4.5 Conclusions and Discussion of Future Research 


This research has given critical contributions to neocortical dynamics and 
the nonlinear dynamics methods to study the brain dynamics and epilepto- 
genic processes, which contain almost none of these features to significant 
degree, and are likely to have a wide spectrum of applications to successful 
theories of neocortical function. We have also demonstrated that successful 
theories must either approximate interactions between neuronal masses at 
microscopic and macroscopic levels consistent with the scale of the experi- 
ment (e.g., electrode size and location) by using modern statistical methods 
and methods in chaos theory to express variables at experimentally interest- 
ing scales in terms of degrees of chaoticity over microscopic and macroscopic 
variables and their distribution functions at different scales. In the future, 


W.A. Chaovalitwongse, W. Suharitdamrong, P.M. Pardalos 


132 


sajdues ul aul} 


0.25 


0.1 


frequency 


Fig. 4.13 Wigner—Ville time-frequency of 5 sec EEG data before a seizure onset. 
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Fig. 4.14 Choi—Williams time-frequency of 5 sec EEG data before a seizure onset. 
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we also plan to expand our research to take into account more from genuine 
neocortical theory, which is based on real anatomy and physiology (as under- 
stood at the time) and contains no free (arbitrary) parameters. Although this 
might be an indication that cortical spatiotemporal responses change with 
the chaoticity of the EEG, such as continuity, there is still much research 
to be done to study how these responses relate to perception of complex 
forms. Nevertheless, we can broadly characterize the brain network in terms 
of the dynamics underlying EEG information processing. The brain network 
modules are directly mapped onto the spatiotemporal properties of the EEG, 
which are governed by attractor dynamics extending over time. In attractor 
networks, the changes in the dynamics of the brain network can be reflected 
from epileptogenic stimuli as sets of initial points corresponding to those 
stimuli converge onto an attractor. During the pre-ictal and ictal periods, 
the complete convergence of the brain dynamics onto an attractor is postu- 
lated to be such a sequence of the ideal network’s response to epileptogenesis, 
which will be an infinite transient whose path is governed by the input and 
the attractor structure of the brain network. 

The results from this study suggested that different EEG frequency bands 
operate in different functions and have different architectures. This separa- 
tion of different EEG frequency bands by function and “anatomy” is worth 
study because because it may explain correspondences between deficits in 
the brain network’s performance when specific brain functioning is damaged, 
which might be attributed to the localization of epileptogenesis. In the fu- 
ture, in order to address issues such as epileptogenesis localization, it is cru- 
cially important to understand representation and processing in connectivity 
models of the brain network as a tool for understanding neuropathologies. 
We postulated that there are crucial connections during epileptogenic pro- 
cesses governed by the pyramidal cells in which the probability of a synapse 
occurring between two cells varies inversely with the distance between them 
independent of their spatial separation. In addition, those connections can be 
driven by the long-range apical system of the brain connectivity, coordinated 
in its projections between cortical areas in a space-independent fashion. 

If one confines the time scale over which interactions occur to be short, it 
might be possible to regard these long-range connections as forming a number 
of feedforward systems of epileptogenesis. Ideally, if one can project back any 
signals, in theory, it is possible that the process would take too long to arrive 
to modify the initial feedforward stimulus and the epileptogenic processes 
be intervened. In addition, one possibility is to regard spatially localized 
collections of cells in the neocortex as components of a system. To monitor the 
time scale interactions of the brain network, we assume some randomness in 
the strength of individual connections, therefore, the strength of connections 
between groups of cells is approximately symmetric over a short range on 
average. Although the probability of connection between individual cells is 
space-dependent, this does not imply multiple connections between pairs of 
nearby cells. 
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As fundamentally unstable brain cell units, their behavior is subject to 
change independent of their interactions with other neuronal groups. Fur- 
thermore, if the group of neuronal interactions to a symmetrically connected 
attractor is conceivable, it will also be highly constrained in the time scale 
to operate effectively. Another possible assumption for epileptogenic localiza- 
tion is that neuronal networks with spatially localized connections behave as 
selective dynamic patterns, forming systems in parallel to reaction—diffusion 
systems of brain connectivity. If this assumption holds, then the time and 
space constants for the decay of activity depending on the relative strengths 
and distributions of excitatory and inhibitory neurons and the level of “back- 
ground” neuronal potential action activity in the system become critical. 
For this reason, in the future, we need to find an explanation of how the 
brain functions associated with epileptogenesis may be separated by time 
scales rather than anatomy theory. One may explain neuropsychological dis- 
sociations of function as neurochemical systems operating over different time 
courses as anatomical epileptogenic localizations. Sometimes it may be appro- 
priate to equate “lesions” of connectivity models with localized epileptogenic 
zones. Nevertheless, brain dysfunction may be attributable to epileptogenesis 
to remote or diffuse modulatory systems. It also requires greater understand- 
ing in drawing inferences from neuropsychology to support modular connec- 
tivity models of epileptogenesis and to explain the basis of neuropathologies 
of epileptogenic zones. 

The studies of the connections of neocortical dynamics and neuronal net- 
works also need to be addressed in order for us to have greater understanding 
about the brain. Our group has done some preliminary studies focusing on 
the idea that connections between assumed functional units at different scales 
(e.g., neurons, minicolumns, macrocolumns) can be symmetric or asymmet- 
ric, depending on the spatial scale of such units [30]. In a realm of network 
connectivity studies, the nature of connections at different scales is a criti- 
cal theoretical issue. Also, features other than symmetry of such interactions 
(e.g., the density of the network’s connections) are important and addressed 
in that study [30]. Because neocortical dynamic variables may behave quite 
differently at different scales, the “interconnectivities” are studied. Further- 
more, interactions across scales are also important, as emphasized in [30]. Any 
theory of EEG constructed at smaller scales must be coarse grained before 
comparisons are made with scalp data. In the future, we plan to experiment 
on the same analyses with some of these ideas on scalp EEG data. We expect 
chaotic or quasiperiodic behavior may be observed depending on the spatial 
filter implicit in the experimental methods. 
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Chapter 5 


Nonconvex Optimization for 
Communication Networks 


Mung Chiang 


Summary. Nonlinear convex optimization has provided both an insightful 
modeling language and a powerful solution tool to the analysis and design of 
communication systems over the last decade. A main challenge today is on 
nonconvex problems in these applications. This chapter presents an overview 
on some of the important nonconvex optimization problems in communi- 
cation networks. Four typical applications are covered: Internet congestion 
control through nonconcave network utility maximization, wireless network 
power control through geometric and sigmoidal programming, DSL spectrum 
management through distributed nonconvex optimization, and Internet intra- 
domain routing through nonconvex, nonsmooth optimization. A variety of 
nonconvex optimization techniques are showcased: sum-of-squares program- 
ming through successive SDP relaxation, signomial programming through 
successive GP relaxation, leveraging specific structures in these engineering 
problems for efficient and distributed heuristics, and changing the underlying 
protocol to enable a different problem formulation in the first place. Collec- 
tively, they illustrate three alternatives of tackling nonconvex optimization 
for communication networks: going “through” nonconvexity, “around” non- 
convexity, and “above” nonconvexity. 
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5.1 Introduction 


There have been two major “waves” in the history of optimization theory 
and its applications: the first started with linear programming (LP) and the 
simplex method in the late 1940s, and the second with convex optimization 
and the interior point method in the late 1980s. Each has been followed by a 
transforming period of “appreciation-application cycle”: as more people ap- 
preciate the use of LP/convex optimization, more look for their formulations 
in various applications; then more work on its theory, efficient algorithms, 
and software; the more powerful the tools become, and in turn more peo- 
ple appreciate its usage. Communication systems benefit significantly from 
both waves; the vast array of many success stories includes multicommodity 
flow solutions (e.g., Bellman—Ford algorithm) from LP, and network utility 
maximization and robust transceiver design from convex optimization. 
Much of the current research is about the potential of the third wave, on 
nonconvex optimization. If one word is used to differentiate between easy and 
hard problems, convexity is probably the “watershed.” But if a longer descrip- 
tion length is allowed, useful conclusions can be drawn even for nonconvex 
optimization. Indeed, convexity is a very disturbing watershed, because it is 
not a topological invariant under change of variable (e.g., see geometric pro- 
gramming) or higher-dimension embedding (e.g., see sum of squares method). 
A variety of approaches has been proposed to tackle nonconvex optimization 
problems: from successive convex approximation to dualization, from nonlin- 
ear transformation to turn an apparently nonconvex problem into a convex 
problem to characterization of attraction regions and systematically jump- 
ing out of a local optimum, and from leveraging the specific structures of the 
problems (e.g., difference of convex functions, concave minimization, low rank 
nonconvexity) to developing more efficient branch-and-bound procedures. 
Researchers in communications and networking have been examining non- 
convex optimization using domain-specific structures in important problems 
in the areas of wireless networking, Internet engineering, and communication 
theory. Perhaps four typical topics best illustrate the variety of challenging 
issues arising from nonconvex optimization in communication systems: 


e Nonconvex objective to be minimized. An example is congestion control 
for inelastic application traffic, where a nonconcave utility function needs 
to be maximized. 

e Nonconvex constraint set. An example is power control in the low SIR 
regime. 

e Integer constraints. Two important examples are single path routing and 
multiuser detection. 

e Constraint sets that are convex but require an exponential number of in- 
equalities to explicitly describe. An example is optimal scheduling in mul- 
tihop wireless networks under certain interference models. The problem of 
wireless scheduling will not be discussed in this chapter. Interested readers 
can refer to [73] for a unifying framework of the problem. 
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This chapter overviews the latest results in recent publications about the 
first two topics, with a particular focus on showing the connections between 
the engineering intuitions about important problems in communication net- 
works and the state-of-the-art algorithms in nonconvex optimization theory. 
Most of the results surveyed here were obtained in 2005-2006, and the prob- 
lems driven by fundamental issues in the Internet, wireless, and broadband 
access networks. As this chapter illustrates, even after much progress made 
in recent years, there are still many challenging mysteries to be resolved on 
these important nonconvex optimization problems. 


. © 
1 2 3 


Fig. 5.1 Three major types of approaches when tackling nonconvex optimization problems 
in communication networks: Go (1) through, (2) around, or (3) above nonconvexity. 


It is interesting to point out that, as illustrated in Figure 5.1, there are at 
least three very different approaches to tackle the difficult issue of noncon- 
vexity. 


e Go “through” nonconvexity. In this approach, we try to solve the difficult 
nonconvex problem; for example, we may use successive convex relaxations 
(e.g., sum-of-squares, signomial programming), utilize special structures in 
the problem (e.g., difference of convex functions, generalized quasiconcav- 
ity), or leverage smarter branch and bound methods. 

e Go “around” nonconvexity. In this approach, we try to avoid solving the 
convex problem; for example, we may discover a change of variables that 
turns the seemingly nonconvex problem into a convex one, determine con- 
ditions under which the problem is convex or the KKT point is unique, or 
make approximations to make the problem convex. 

e Go “above” nonconvexity. In this approach, we try to reformulate the 
nonconvex problem in the first place to make it more “solvable” or “ap- 
proximately solvable.” We observe that optimization problem formulations 
are induced by some underlying assumptions on what the network archi- 
tectures and protocols should look like. By changing these assumptions, a 
different, much easier-to-solve or easier-to-approximate formulations may 
result. We refer to this approach as design for optimizability, which is con- 
cerned with redrawing architectures to make the resulting optimization 
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problem easier to solve. This approach of changing a hard problem into 
an easier one is in contrast to optimization, which tries to solve a given, 
possibly difficult, problem. 


The four topics chosen in this chapter span a range of application contexts and 
tasks in communication networks. The sources of difficulty in these nonconvex 
optimization problems are summarized in Table 5.1, together with the key 
ideas in solving them and the type of approaches used. For more details 
beyond this brief overview chapter, please refer to the related publications 
[29, 19, 14, 35, 7, 70, 71] by the author and coworkers and the references 
therein. 


Table 5.1 Summary of four nonconvex optimization problems in this chapter 


5.2 Internet Congestion | Nonconcave U Sum of “Through” 
a ee Deal 
5.3 Wireless Power Posynomial Geometric “Around” 
a ae ee 


5.4 DSL Spectrum Posynomial Problem “Around” 
pT TE feseenene] “into | secre | 

5.5 Internet Routing Nonconvex | Approximation | “Above” 
eee LE [eee [ye 


5.2 Internet Congestion Control 


5.2.1 Introduction 


Basic Network Utility Maximization 


Since the publication of the seminal paper [37] by Kelly, Maulloo, and Tan in 
1998, the framework of network utility maximization (NUM) has found many 
applications in network rate allocation algorithms and Internet congestion 
control protocols (e.g., surveyed in [45, 60]). It has also led to a systematic 
understanding of the entire network protocol stack in the unifying framework 
of “layering as optimization decomposition” (e.g., surveyed in [13, 49, 44]). 
By allowing nonlinear concave utility objective functions, NUM substantially 
expands the scope of the classical LP-based network flow problems. 
Consider a communication network with L links, each with a fixed capacity 
of c; bps, and S sources (i.e., end-users), each transmitting at a source rate 
of «; bps. Each source s emits one flow, using a fixed set L(s) of links in its 
path, and has a utility function U;(%,;). Each link / is shared by a set S(1) of 
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sources. Network utility maximization, in its basic version, is the following 
problem of maximizing the total utility of the network }>,U;(xs), over the 
source rates x, subject to linear flow constraints vie L(s) ¥s < c, for all 
links 1: 


maximize )>, Us(as) 
subject to Dies) ts Sa, VI, (5.1) 
x= 0, 


where the variables are x € R°. 

There are many nice properties of the basic NUM model due to several 
simplifying assumptions of the utility functions and flow constraints, which 
provide the mathematical tractability of problem (5.1) but also limit its ap- 
plicability. In particular, the utility functions {U,} are often assumed to be 
increasing and strictly concave functions. 

Assuming that U,(a,) becomes concave for large enough «, is reasonable, 
because the law of diminishing marginal utility eventually will be effective. 
However, U, may not be concave throughout its domain. In his seminal pa- 
per in 1995, Shenker [57] differentiated inelastic network traffic from elastic 
traffic. Utility functions for elastic traffic were modeled as strictly concave 
functions. Although inelastic flows with nonconcave utility functions repre- 
sent important applications in practice, they have received little attention 
and rate allocation among them has only a limited mathematical foundation. 
There have been three recent publications [41, 29, 19] (see also earlier work 
in [69, 42, 43] related to the approach in [41]) on this topic. 

In this section, we investigate the extension of the basic NUM to max- 
imization of nonconcave utilities, as in the approach of [19]. We provide a 
centralized algorithm for offline analysis and establishment of a performance 
benchmark for nonconcave utility maximization when the utility function is a 
polynomial or signomial. Based on the semialgebraic approach to polynomial 
optimization, we employ convex sum-of-squares (SOS) relaxations solved by 
a sequence of semidefinite programs (SDP), to obtain increasingly tighter up- 
per bounds on total achievable utility for polynomial utilities. Surprisingly, 
in all our experiments, a very low-order and often a minimal-order relaxation 
yields not just a bound on attainable network utility, but the globally maxi- 
mized network utility. When the bound is exact, which can be proved using 
a sufficient test, we can also recover a globally optimal rate allocation. 


Canonical Distributed Algorithm 


A reason that the assumption of a utility function’s concavity is upheld in 
many papers on NUM is that it leads to three highly desirable mathematical 
properties of the basic NUM: 
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e It is a convex optimization problem, therefore the global minimum can be 
computed (at least in centralized algorithms) in worst-case polynomial- 
time complexity [4]. 

e Strong duality holds for (5.1) and its Lagrange dual problem. A zero du- 
ality gap enables a dual approach to solve (5.1). 

e Minimization of a separable objective function over linear constraints can 
be conducted by distributed algorithms based on the dual approach. 


Indeed, the basic NUM (5.1) is such a “nice” optimization problem that 
its theoretical and computational properties have been well studied since the 
1960s in the field of monotropic programming (e.g., as summarized in [54]). 
For network rate allocation problems, a dual-decomposition-based distributed 
algorithm has been widely studied (e.g., in [37, 45]), and is summarized below. 

Zero duality gap for (5.1) states that solving the Lagrange dual problem is 
equivalent to solving the primal problem (5.1). The Lagrange dual problem 
is readily derived. We first form the Lagrangian of (5.1): 


L(x, A) =)" Ue(@s) + > ci — > Ts]; 
s l 


sES(l) 


where 4; > 0 is the Lagrange multiplier (can be interpreted as the link con- 
gestion price) associated with the linear flow constraint on link 1. Additivity 
of total utility and linearity of flow constraints lead to a Lagrangian dual 
decomposition into individual source terms: 


LaAj= >, Us(“s) — ‘e AL | Us + Soar 
l 


8 le L(s) 


= S_Ls(as;°) + Soar, 
s l 


where AS = VieL(s) Ai. For each source s, Ls(xs,A*°) = Us(a5) — A®xs only 
depends on local x, and the link prices \; on those links used by source s. 

The Lagrange dual function g(A) is defined as the maximized L(x, A) over 
x. This “net utility” maximization obviously can be conducted distributively 
by each source, as long as the aggregate link price \* = }°). L(s) A, is available 
to source s, where source s maximizes a strictly concave function L,(a., A*) 
over x, for a given \*: 


x= (A*) = argmax [U,(x,) — A®x,], Vs. (5.2) 
The Lagrange dual problem is 


minimize g(A) = L(x*(A), A) 


subject to A = 0, (5.3) 
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where the optimization variable is A. Any algorithms that find a pair of 
primal—dual variables (x, A) that satisfy the KKT optimality condition would 
solve (5.1) and its dual problem (5.3). One possibility is a distributed, 
iterative subgradient method, which updates the dual variables A to solve 
the dual problem (5.3): 


+ 


M(t+1) = |Ar(t)- a(t) a -— So 2e(A8(d)) , vl, (5.4) 
sES(l) 


where ¢ is the iteration number and a(t) > 0 are step sizes. Certain choices of 
step sizes, such as a(t) = ao/t, ao > 0, guarantee that the sequence of dual 
variables A(t) will converge to the dual optimal A* as t > oo. The primal 
variable x(A(t)) will also converge to the primal optimal variable x*. For a 
primal problem that is a convex optimization, the convergence is towards the 
global optimum. 

The sequence of the pair of algorithmic steps (5.2, 5.4) forms what we refer 
to as the canonical distributed algorithm, which solves the network utility 
optimization problem (5.1) and the dual (5.3) and computes the optimal 
rates x* and link prices A*. 


Nonconcave Network Utility Maximization 


It is known that for many multimedia applications, user satisfaction may 
assume a nonconcave shape as a function of the allocated rate. For example, 
the utility for voice applications is better described by a sigmoidal function: 
with a convex part at low rate and a concave part at high rate, and a single 
inflexion point x° (with U//(x°) = 0) separating the two parts. Furthermore, 
in some other models of utility functions, the concavity assumption on U, 
is also related to the elasticity assumption on rate demands by users. When 
demands for x, are not perfectly elastic, U,(a,) may not be concave. 

Suppose we remove the critical assumption that {U,} are concave func- 
tions, and allow them to be any nonlinear functions. The resulting NUM 
becomes nonconvex optimization and significantly harder to be analyzed and 
solved, even by centralized computational methods. In particular, a local op- 
timum may not be a global optimum and the duality gap can be strictly 
positive. The standard distributive algorithms that solve the dual problem 
may produce infeasible or suboptimal rate allocation. 

There have been several recent publications on distributed algorithms for 
nonconcave utility maximization. In [41], a “self-regulation” heuristic is pro- 
posed to avoid the resulting oscillation in rate allocation and is shown to 
converge to an optimal rate allocation asymptotically when the proportion of 
nonconcave utility sources vanishes. In [29], a set of sufficient conditions and 
necessary conditions is presented under which the canonical distributed algo- 
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Fig. 5.2 Some examples of utility functions Us(xs): it can be concave or sigmoidal as 
shown in the graph, or any general nonconcave function. If the bottleneck link capacity 
used by the source is small enough, that is, if the dotted vertical line is pushed to the left, 
a sigmoidal utility function effectively becomes a convex utility function. 


rithm still converges to the globally optimal solution. However, these condi- 
tions may not hold in many cases. These two approaches illustrate the choice 
between admission control and capacity planning to deal with nonconvexity 
(see also the discussion in [36]). But neither approach provides a theoretically 
polynomial-time and practically efficient algorithm (distributed or central- 
ized) for nonconcave utility maximization. 

In [19], using a family of convex semidefinite programming (SDP) relax- 
ations based on the sum-of-squares (SOS) relaxation and the positivstellen- 
satz theorem in real algebraic geometry, we apply a centralized computational 
method to bound the total network utility in polynomial time. A surprising 
result is that for all the examples we have tried, wherever we could verify 
the result, the tightest possible bound (i.e., the globally optimal solution) 
of NUM with nonconcave utilities is computed with a very low-order relax- 
ation. This efficient numerical method for offline analysis also provides the 
benchmark for distributed heuristics. 

These three different approaches: proposing distributed but suboptimal 
heuristics (for sigmoidal utilities) in [41], determining optimality conditions 
for the canonical distributed algorithm to converge globally (for all nonlinear 
utilities) in [29], and proposing an efficient but centralized method to compute 
the global optimum (for a wide class of utilities that can be transformed into 
polynomial utilities) in [19] (and this section), are complementary in the 
study of distributed rate allocation by nonconcave NUM. 
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5.2.2 Global Maximization of Nonconcave Network 
Utility 


Sum-of-Squares Method 


We would like to bound the maximum network utility by y in polynomial 
time and search for a tight bound. Had there been no link capacity con- 
straints, maximizing a polynomial is already an NP-hard problem, but can 
be relaxed into an SDP [58]. This is because testing if the following bound- 
ing inequality holds y > p(x), where p(x) is a polynomial of degree d in n 
variables, is equivalent to testing the positivity of y — p(x), which can be 
relaxed into testing if y — p(x) can be written as a sum of squares (SOS): 
p(x) = >>_, ai(x)? for some polynomials g;, where the degree of qj is less 
than or equal to d/2. This is referred to as the SOS relaxation. If a poly- 
nomial can be written as a sum of squares, it must be nonnegative, but not 
vice versa. Conditions under which this relaxation is tight have been studied 
since Hilbert. Determining if a sum of squares decomposition exists can be 
formulated as an SDP feasibility problem, thus polynomial-time solvable. 

Constrained nonconcave NUM can be relaxed by a generalization of the 
Lagrange duality theory, which involves nonlinear combinations of the con- 
straints instead of linear combinations in the standard duality theory. The 
key result is the positivstellensatz, due to Stengle [62], in real algebraic geom- 
etry, which states that for a system of polynomial inequalities, either there 
exists a solution in R” or there exists a polynomial which is a certificate 
that no solution exists. This infeasibility certificate has recently been shown 
to be also computable by an SDP of sufficient size [51, 50], a process that 
is referred to as the sum-of-squares method and automated by the software 
SOSTOOLS [52] initiated by Parrilo in 2000. For a complete theory and many 
applications of SOS methods, see [51] and references therein. 

Furthermore, the bound 4 itself can become an optimization variable in the 
SDP and can be directly minimized. A nested family of SDP relaxations, each 
indexed by the degree of the certificate polynomial, is guaranteed to produce 
the exact global maximum. Of course, given the problem is NP-hard, it is 
not surprising that the worst-case degree of certificate (thus the number of 
SDP relaxations needed) is exponential in the number of variables. What is 
interesting is the observation that in applying SOSTOOLS to nonconcave 
utility maximization, a very low-order, often the minimum-order relaxation 
already produces the globally optimal solution. 


Application of SOS Method to Nonconcave NUM 


Using sum-of-squares and the positivstellensatz, we set up the following prob- 
lem whose objective value converges to the optimal value of problem (5.1), 
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where {U;} are now general polynomials, as the degree of the polynomials 
involved is increased. 


minimize 


subject to 

Y— dis Us(@s) — Vr AU) (C1 — Vise sii #5) 

— im Aik (*) (Cj — Lsescj) @s) (Ck — Vises (ey Us)— (5.5) 
112 A2...n(x)(e1 — Dses(1) Be)... (Cn - pee Zs) 

is SOS, 


A(x), Agel) sey A12...n(X) are SOS. 


The optimization variables are y and all of the coefficients in polynomials 
Ai(x), Aje(X), ---, A12...n(x). Note that x is not an optimization variable; the 
constraints hold for all x, therefore imposing constraints on the coefficients. 
This formulation uses Schmiidgen’s representation of positive polynomials 
over compact sets [56]. 

Let D be the degree of the expression in the first constraint in (5.5). We 
refer to problem (5.5) as the SOS relaxation of order D for the constrained 
NUM. For a fixed D, the problem can be solved via SDP. As D is increased, 
the expression includes more terms, the corresponding SDP becomes larger, 
and the relaxation gives tighter bounds. An important property of this nested 
family of relaxations is guaranteed convergence of the bound to the global 
maximum. 

Regarding the choice of degree D for each level of relaxation, clearly a 
polynomial of odd degree cannot be SOS, so we need to consider only the 
cases where the expression has even degree. Therefore, the degree of the first 
nontrivial relaxation is the largest even number greater than or equal to 
degree }>, Us(xs), and the degree is increased by 2 for the next level. 

A key question now becomes: how do we find out, after solving an SOS 
relaxation, if the bound happens to be exact? Fortunately, there is a sufh- 
cient test that can reveal this, using the properties of the SDP and its dual 
solution. In [31, 39], a parallel set of relaxations, equivalent to the SOS ones, 
is developed in the dual framework. The dual of checking the nonnegativity 
of a polynomial over a semialgebraic set turns out to be finding a sequence of 
moments that represent a probability measure with support in that set. To 
be a valid set of moments, the sequence should form a positive semidefinite 
moment matrix. Then, each level of relaxation fixes the size of this matrix 
(i.e., considers moments up a certain order) and therefore solves an SDP. 
This is equivalent to fixing the order of the polynomials appearing in SOS 
relaxations. The sufficient rank test checks a rank condition on this moment 
matrix and recovers (one or several) optimal x*, as discussed in [31]. 

In summary, we have the following algorithm for centralized computa- 
tion of a globally optimal rate allocation to nonconcave utility maximization, 
where the utility functions can be written as or converted into polynomials. 
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Algorithm 1. Sum-of-squares for nonconcave utility maximization. 


1. Formulate the relaxed problem (5.5) for a given degree D. 

2. Use SDP to solve the Dth order relaxation, which can be conducted 
using SOSTOOLS [52]. 

3. If the resulting dual SDP solution satisfies the sufficient rank condition, 
the Dth-order optimizer y*(D) is the globally optimal network utility, and a 
corresponding x* can be obtained. 

4. Increase D to D+2, that is, the next higher-order relaxation, and repeat. 


In the following section, we give examples of the application of SOS re- 
laxation to the nonconcave NUM. We also apply the above sufficient test to 
check if the bound is exact, and if so, we recover the optimum rate allocation 
x* that achieve this tightest bound. 


5.2.8 Numerical Examples and Sigmoidal Utilities 


Polynomial Utility Examples 


First, consider quadratic utilities (ie., Us(a,) = x2) as a simple case to start 
with (this can be useful, for example, when the bottleneck link capacity limits 
sources to their convex region of a sigmoidal utility). We present examples 
that are typical, in our experience, of the performance of the relaxations. 


Fig. 5.3 Network topology for Example 5.1. 


Example 5.1. A small illustrative example. Consider the simple 2-link, 3- 
user network shown in Figure 5.3, with c = [1, 2]. The optimization problem 
is 
maximize )>, x2 
subject to #1 + a2 < 1 
+43 <2 
1, 22,23 > 0. 


(5.6) 


! Otherwise, y*(D) may still be the globally optimal network utility but is only provably 
an upper bound. 
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The first level relaxation with D = 2 is 


minimize y 

subject to 

Y~- (a? + ne + 2) = Ai(—21 —Xto2+ 1) _— A2(—21 
r3+ 2) A3@1 — AgxXQ — A5xB Aw ( L1,—-Qa+ 1) 

(—21 — 23+ 2) ad A7t1(—21 — a+ 1) ae Agt2(—21 (5.7) 
v.27 1) Agt3( Ly Ly+ 1) — A10%1(—21 — 23+ 2) 

—Aq122(-21 —a@3+ 2) = A12%3(—21 — 23+ 2)— 

A132%1X2 = A1421 23 = A15L2L3 is SOS, 

i > 0, i=1,...,15. 


The first constraint above can be written as 27 Qa for x = [1, 21, 2a, x3|7 
and an appropriate Q. For example, the (1,1) entry which is the constant 
term reads y — Ay — 2A2 — 2Ag6, the (2,1) entry, coefficient of x1, reads Ay + 
Ag — Az + 3Ag — A7 — 2Az9, and so on. The expression is SOS if and only if 
Q > 0. The optimal y is 5, which is achieved by, for example, A; = 1, Az = 2, 
A3 = 1, Ag = 1, A10 = 1, Aj2 = 1; A13 = i; Aa = 2 and the rest of the ri 
equal to zero. Using the sufficient test (or, in this example, by inspection) we 
find the optimal rates xp = (0, 1, 2}. 

In this example, many of the \; could be chosen to be zero. This means 
not all product terms appearing in (5.7) are needed in constructing the SOS 
polynomial. Such information is valuable from the decentralization point of 
view, and can help determine to what extent our bound can be calculated in 
a distributed manner. This is a challenging topic for future work. 


Cc 
3 C6 
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Fig. 5.4 Network topology for Example 5.2. 


Example 5.2. Larger tree topology. As a larger example, consider the net- 
work shown in Figure 5.4 with seven links. There are nine users, with the 
following routing table that lists the links on each user’s path. 
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For c = [5, 10, 4, 3, 7, 3, 5], we obtain the bound y = 116 with D = 2, 
which turns out to be globally optimal, and the globally optimal rate vector 
can be recovered: xp = [5, 0, 4, 0, 1, 0, 0, 5, 7]. In this example, exhaustive 
search is too computationally intensive, and the sufficient condition test plays 
an important role in proving the bound is exact and in recovering xq. 


Fig. 5.5 Network topology for Example 5.3. 


Example 5.3. Large m-hop ring topology. Consider a ring network with n 
nodes, n users, and n links where each user’s flow starts from a node and goes 
clockwise through the next m links, as shown in Figure 5.5 for n = 6, m = 2. 
As a large example, with n = 25, m = 2, and capacities chosen randomly 
for a uniform distribution on [0, 10], using relaxation of order D = 2 we 
obtain the exact bound y = 321.11 and recover an optimal rate allocation. 
For n = 30, m = 2, and capacities randomly chosen from [0, 15], it turns out 
that D = 2 relaxation yields the exact bound 816.95 and a globally optimal 
rate allocation. 


Sigmoidal Utility Examples 


Now consider sigmoidal utilities in a standard form: 


1 


Uslts) = Tee 


where {a,,b,} are constant integers. Even though these sigmoidal functions 
are not polynomials, we show the problem can be cast as one with polynomial 
cost and constraints, with a change of variables. 


Example 5.4. Sigmoidal utility. Consider the simple 2-link, 3-user example 
shown in Figure 5.3 for a, = 1 and b, = —5. 
The NUM problem is to 
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maximize pa ery 
subject to #1 +22 <c¢4 (5.8) 
Ly +23 S CQ 
x > 0. 


Let ys = 1/ (1+ e7:—)), then 2, = —log((1/ys) — 1) +5. Substituting 
for 21, X2 in the first constraint, arranging terms and taking exponentials, 
then multiplying the sides by y,y2 (note that y,, y2 > 0), we get 


(1 = yr)(1 — yo) > et yy yo, 


which is polynomial in the new variables y. This applies to all capac- 
ity constraints, and the nonnegativity constraints for x, translate to y, > 
1/ (1+ °). Therefore the whole problem can be written in polynomial form, 
and SOS methods apply. This transformation renders the problem polynomial 
for general sigmoidal utility functions, with any a, and bg. 

We present some numerical results, using a small illustrative example. Here 
SOS relaxations of order 4 (D = 4) were used. For c; = 4, cg = 8, we find 
y = 1.228, which turns out to be a global optimum, with x9 = [0, 4, 8] 
as the optimal rate vector. For cy = 9, co = 10, we find y = 1.982 and 
Xo = [0, 9, 10]. Now place a weight of 2 on y;, and the other y, have weight 
one; we obtain y = 1.982 and xo = [9, 0, 1]. 

In general, if a, 4 1 for some s, however, the degree of the polynomials in 
the transformed problem may be very high. If we write the general problem 
as 

maximize >, ee CN 
subject to ys€S(1) ts <c, Vi, (5.9) 
x > 0, 


each capacity constraint after transformation will be 


T1.(1 — ys)"tTvees > 
exp(— Ie as(c + Me T13/Asbs)) [. Ys Tes “ 


where rj; = 1 if 1 € L(s) and equals 0 otherwise. Because the product of the 
ds appears in the exponents, a, > 1 significantly increases the degree of the 
polynomials appearing in the problem and hence the dimension of the SDP 
in the SOS method. 

It is therefore also useful to consider alternative representations of sig- 
moidal functions such as the following rational function: 


where the inflection point is 2° = ((a(n —1)) /(n+ 1))'/" and the slope at 


the inflection point is U,(x°) = ((n—1) /4n) ((n +1) / (a(n —1)))/”. Let 
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Ys = U,(x5); the NUM problem in this case is equivalent to 


maximize )>. Ys 

subject to 2? — y.2” — ays = 0 
dses(l) Xs < Cl; V1 
x >0 


(5.10) 


which again can be accommodated in the SOS method and be solved by 
Algorithm 1. 

The benefit of this choice of utility function is that the largest degree of 
the polynomials in the problem is n+ 1, therefore growing linearly with n. 
The disadvantage compared to the exponential form for sigmoidal functions 
is that the location of the inflection point and the slope at that point cannot 
be set independently. 


5.2.4 Alternative Representations for Convex 
Relazxations to Nonconcave NUM 


The SOS relaxation we used in the last two sections is based on Schmiidgen’s 
representation for positive polynomials over compact sets described by other 
polynomials. We now briefly discuss two other representations of relevance 
to the NUM, that are interesting from both theoretical (e.g., interpretation) 
and computational points of view. 


LP Relaxation 


Exploiting linearity of the constraints in NUM and with the additional as- 
sumption of nonempty interior for the feasible set (which holds for NUM), 
we can use Handelman’s representation [30] and refine the positivstellen- 
satz condition to obtain the following convex relaxation of nonconcave NUM 
problem. 


maximize y 


subject to 
L 
Ll 
¥- Ls Us(ts) = Y) Aa] [la - Leesa es) VX oe) 
aEeNt l=1 
Na = 0, Va, 


where the optimization variables are 7 and A,, and a denotes an ordered set 
of integers {a;}. 

Fixing D where }>,a; < D, and equating the coefficients on the two 
sides of the equality in (5.11), yields a linear program (LP). There are no 
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SOS terms, therefore no semidefiniteness conditions. As before, increasing 
the degree D gives higher-order relaxations and a tighter bound. 

We provide a pricing interpretation for problem (5.11). First, normalize 
each capacity constraint as 1 — uj(x) = 0, where u(x) = d),¢5() s/c. We 
can interpret u;(x) as link usage, or the probability that link / is used at 
any given point in time. Then, in (5.11), we have terms linear in u such as 
Ai(1 — u(x)), in which ; has a similar interpretation as in concave NUM, as 
the price of using link . We also have product terms such as \j,(1—u,(x))(1— 
uz(x)), where A;,u;(x)u,(x) indicates the probability of simultaneous usage 
of links 7 and k, for links whose usage probabilities are independent (e.g., 
they do not share any flows). Products of more terms can be interpreted 
similarly. 

Although the above price interpretation is not complete and does not jus- 
tify all the terms appearing in (5.11) (e.g., powers of the constraints, product 
terms for links with shared flows), it does provide some useful intuition: this 
relaxation results in a pricing scheme that provides better incentives for the 
users to observe the constraints, by giving an additional reward (because the 
corresponding term adds positively to the utility) for simultaneously keeping 
two links free. Such incentive helps tighten the upper bound and eventually 
achieve a feasible (and optimal) allocation. 

This relaxation is computationally attractive because we need to solve an 
LPs instead of the previous SDPs at each level. However, significantly more 
levels may be required [40]. 


Relaxation with No Product Terms 


Putinar [53] showed that a polynomial positive over a compact set® can be 
represented as an SOS-combination of the constraints. This yields the follow- 
ing convex relaxation for nonconcave NUM problem. 


maximize  ¥ 
subject to 


ty > Us(as) = yey Ai(x) (e1 i ses!) tek Vx 
A(x) is SOS, 


(5.12) 


where the optimization variables are the coefficients in A;(x). Similar to the 
SOS relaxation (5.5), fixing the order D of the expression in (5.12) results in 
an SDP. This relaxation has the nice property that no product terms appear: 
the relaxation becomes exact with a high enough D without the need of 
product terms. However, this degree might be much higher than what the 
previous SOS method requires. 


? With an extra assumption that always holds for linear constraints as in NUM problems. 
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5.2.5 Concluding Remarks and Future Directions 


We consider the NUM problem in the presence of inelastic flows, that is, flows 
with nonconcave utilities. Despite its practical importance, this problem has 
not been studied widely, mainly due to the fact it is a nonconvex problem. 
There has been no effective mechanism, centralized or distributed, to com- 
pute the globally optimal rate allocation for nonconcave utility maximization 
problems in networks. This limitation has made performance assessment and 
design of networks that include inelastic flows very difficult. 

In one of the recent works on this topic [19], we employed convex SOS re- 
laxations, solved by a sequence of SDPs, to obtain high-quality, increasingly 
tighter upper bounds on total achievable utility. In practice, the performance 
of our SOSTOOLS-based algorithm was surprisingly good, and bounds ob- 
tained using a polynomial-time (and indeed a low-order and often minimal- 
order) relaxation were found to be exact, achieving the global optimum of 
nonconcave NUM problems. Furthermore, a dual-based sufficient test, if suc- 
cessful, detects the exactness of the bound, in which case the optimal rate 
allocation can also be recovered. This performance of the proposed algorithm 
brings up a fundamental question on whether there is any particular property 
or structure in nonconcave NUM that makes it especially suitable for SOS 
relaxations. 

We further examined the use of two more specialized polynomial repre- 
sentations, one that uses products of constraints with constant multipliers, 
resulting in LP relaxations; and at the other end of spectrum, one that uses 
a linear combination of constraints with SOS multipliers. We expect these re- 
laxations to give higher-order certificates, thus their potential computational 
benefits need to be examined further. We also show they admit economics 
interpretations (e.g., prices, incentives) that provide some insight on how the 
SOS relaxations work in the framework of link congestion pricing for the 
simultaneous usage of multiple links. 

An important research issue to be further investigated is decentralization 
methods for rate allocation among sources with nonconcave utilities. The 
proposed algorithm here is not easy to decentralize, given the products of the 
constraints or polynomial multipliers that destroy the separable structure of 
the problem. However, when relaxations become exact, the sparsity pattern 
of the coefficients can provide information about partially decentralized com- 
putation of optimal rates. For example, if after solving the NUM offline, we 
obtain an exact bound, then if the coefficient of the cross-term 2,7, turns out 
to be zero, it means users 2 and 7 do not need to communicate to each other 
to find their optimal rates. An interesting next step in this area of research is 
to investigate a distributed version of the proposed algorithm through limited 
message passing among clusters of network nodes and links. 
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5.3 Wireless Network Power Control 


5.3.1 Introduction 


Due to the broadcast nature of radio transmission, data rates and other qual- 
ity of service (QoS) issues in a wireless network are affected by interference. 
This is particularly important in CDMA systems where users transmit at 
the same time over the same frequency bands and their spreading codes are 
not perfectly orthogonal. Transmit power control is often used to tackle this 
problem of signal interference [12]. We study how to optimize over the trans- 
mit powers to create the optimal set of signal-to-interference ratios (SIR) on 
wireless links. Optimality here can be with respect to a variety of objectives, 
such as maximizing a systemwide efficiency metric (e.g., the total system 
throughput), or maximizing a QoS metric for a user in the highest QoS class, 
or maximizing a QoS metric for the user with the minimum QoS metric value 
(i.e., a maxmin optimization). 

The objective represents a systemwide goal to be optimized; however, in- 
dividual users’ QoS requirements also need to be satisfied. Any power alloca- 
tion must therefore be constrained by a feasible set formed by these minimum 
requirements from the users. Such a constrained optimization captures the 
tradeoff between user-centric constraints and some network-centric objective. 
Because a higher power level from one transmitter increases the interference 
levels at other receivers, there may not be any feasible power allocation to 
satisfy the requirements from all the users. Sometimes an existing set of re- 
quirements can be satisfied, but when a new user is admitted into the system, 
there exist no more feasible power control solutions, or the maximized ob- 
jective is reduced due to the tightening of the constraint set, leading to the 
need for admission control and admission pricing, respectively. 

Because many QoS metrics are nonlinear functions of SIR, which is in turn 
a nonlinear (and neither convex nor concave) function of transmit powers, 
in general, power control optimization or feasibility problems are difficult 
nonlinear optimization problems that may appear to be NP-hard problems. 
Following [14, 35], this section shows that, when SIR is much larger than 0 dB, 
a class of nonlinear optimization called geometric programming (GP) can be 
used to efficiently compute the globally optimal power control in many of 
these problems, and efficiently determine the feasibility of user requirements 
by returning either a feasible (and indeed optimal) set of powers or a cer- 
tificate of infeasibility. This also leads to an effective admission control and 
admission pricing method. 

The key observation is that despite the apparent nonconvexity, through log 
change of variable the GP technique turns these constrained optimizations 
of power control into convex optimization, which is intrinsically tractable 
despite its nonlinearity in objective and constraints. However, when SIR is 
comparable to or below 0 dB, the power control problems are truly nonconvex 
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with no efficient and global solution methods. In this case, we present a 
heuristic that is provably convergent and empirically almost always computes 
the globally optimal power allocation by solving a sequence of GPs through 
the approach of successive convex approximations. 

The GP approach reveals the hidden convexity structure, which implies 
efficient solution methods and the global optimality of any local optimum in 
power control problems with nonlinear objective functions. It clearly differ- 
entiates the tractable formulations in a high-SIR regime from the intractable 
ones in a low-SIR regime. Power control by GP is applicable to formulations 
in both cellular networks with single-hop transmission between mobile users 
and base stations, and ad hoc networks with multihop transmission among 
the nodes, as illustrated through several numerical examples in this section. 
Traditionally, GP is solved by centralized computation through the highly 
efficient interior point methods. In this section we present a new result on 
how GP can be solved distributively with message passing, which has inde- 
pendent value to general maximization of coupled objective, and applies it to 
power control problems with a further reduction of message-passing overhead 
by leveraging the specific structures of power control problems. 

More generally, the technique of nonlinear change of variables, including 
the log change of variables, to reveal “hidden” convexity in optimization for- 
mulations has recently become quite popular in the communication network 
research community. 


5.3.2 Geometric Programming 


GP is aclass of nonlinear, nonconvex optimization problems with many useful 
theoretical and computational properties. It was invented in 1967 by Duffin, 
Peterson, and Zener [17], and much of the development by the early 1980s was 
summarized in [1]. Because a GP can be turned into a convex optimization 
problem, a local optimum is also a global optimum, the Lagrange duality gap 
is zero under mild conditions, and a global optimum can be computed very 
efficiently. Numerical efficiency holds both in theory and in practice: interior 
point methods applied to GP have provably polynomial-time complexity [48], 
and are very fast in practice with high-quality software downloadable from 
the Internet (e.g., the MOSEK package). Convexity and duality properties 
of GP are well understood, and large-scale, robust numerical solvers for GP 
are available. Furthermore, special structures in GP and its Lagrange dual 
problem lead to distributed algorithms, physical interpretations, and compu- 
tational acceleration beyond the generic results for convex optimization. A 
detailed tutorial of GP and comprehensive survey of its recent applications 
to communication systems and to circuit design can be found in [11] and [3], 
respectively. This section contains a brief introduction of GP terminology. 


156 Mung Chiang 


There are two equivalent forms of GP: standard form and convex form. 
The first is a constrained optimization of a type of function called posynomial, 
and the second form is obtained from the first through a logarithmic change 
of variable. 

We first define a monomial as a function f : R'i, — R: 


@Q) 42) () 
f(x) =dxt ap... ah, 


where the multiplicative constant d > 0 and the exponential constants a‘) € 


R, 7 = 1,2,...,n. A sum of monomials, indexed by k below, is called a 
posynomial: 
K 
a) 4) (n) 
f(x) = > deny O° oii 5 
k=1 


where dy > 0, k = 1,2,...,K, and al €R, j =1,2,...,n, k=1,2,...,K. 
For example, 277" x9:? + 32,74 is a posynomial in x, 21 — 22 is not a 
posynomial, and 21/2 is a monomial, thus also a posynomial. 

Minimizing a posynomial subject to posynomial upper bound inequality 
constraints and monomial equality constraints is called GP in standard form: 


minimize f(x) 


subject to fi(x) <1, 7=1,2,...,m, (5.13) 
Ai(x) = 1, 1=1,2,...,M, 
; ; a® @® (n) 
where f;,7 = 0,1,...,™m, are posynomials: f;(x) = )7,*, digx,** ao ...tn™ , 
a) q(2) (n) 
and hy, 1 =1,2,...,M, are monomials: hj(x) = djx,' wo! ...a7! 


GP in standard form is not a convex optimization problem, because posy- 
nomials are not convex functions. However, with a logarithmic change of the 
variables and multiplicative constants: y; = log x;, bi, = log dix, by = log dz, 
and a logarithmic change of the functions’ values, we can turn it into the 
following equivalent problem in y. 


minimize po(y) = log >y2, exp(ady + box) 
subject to pily) = log at exp(afy + biz) < 0, i= 1, 2, see ™, (5.14) 
a(y) =afy+o=0, 1=1,2,...,M. 


This is referred to as GP in convex form, which is a convex optimization 
problem because it can be verified that the log-sum-exp function is convex 
[4]. 

In summary, GP is a nonlinear, nonconvex optimization problem that can 
be transformed into a nonlinear convex problem. GP in standard form can 
be used to formulate network resource allocation problems with nonlinear 
objectives under nonlinear QoS constraints. The basic idea is that resources 
are often allocated proportional to some parameters, and when resource allo- 
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Function 
Function 


Fig. 5.6 A bivariate posynomial before (left graph) and after (right graph) the log trans- 
formation. A nonconvex function is turned into a convex one. 


cations are optimized over these parameters, we are maximizing an inverted 
posynomial subject to lower bounds on other inverted posynomials, which 
are equivalent to GP in standard form. 


SP/GP, SOS/SDP 


Note that, although the posynomial seems to be a nonconvex function, it be- 
comes a convex function after the log transformation, as shown in an example 
in Figure 5.6. Compared to the (constrained or unconstrained) minimization 
of a polynomial, the minimization of a posynomial in GP relaxes the integer 
constraint on the exponential constants but imposes a positivity constraint on 
the multiplicative constants and variables. There is a sharp contrast between 
these two problems: polynomial minimization is NP-hard, but GP can be 
turned into convex optimization with provably polynomial-time algorithms 
for a global optimum. 

In an extension of GP called signomial programming discussed later in this 
section, the restriction of nonnegative multiplicative constants is removed. 
This results in a general class of nonlinear and truly nonconvex problems 
that is simultaneously a generalization of GP and polynomial minimization 
over the positive quadrant, as summarized in the comparison Table 5.2. 
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Table 5.2 Comparison of GP, constrained polynomial minimization over the positive 

quadrant (PMoP), and signomial programming (SP). All three types of problems minimize 

a sum of monomials subject to upper bound inequality constraints on sums of monomials, 
a) 


but have different definitions of monomial: c]],; x4", as shown in the table. GP is known 


to be polynomial-time solvable, but PMoP and SP are not. 


[ [GP [PMoP| SP 
Pe[Re[ R [R, 


a7, R | 2 [| R 


The objective function of signomial programming can be formulated as 
minimizing a ratio between two posynomials, which is not a posynomial (be- 
cause posynomials are closed under positive multiplication and addition but 
not division). As shown in Figure 5.7, a ratio between two posynomials is a 
nonconvex function both before and after the log transformation. Although it 
does not seem likely that signomial programming can be turned into a convex 
optimization problem, there are heuristics to solve it through a sequence of 
GP relaxations. However, due to the absence of algebraic structures found 
in polynomials, such methods for signomial programming currently lack a 
theoretical foundation of convergence to global optimality. This is in contrast 
to the sum-of-squares method [51], which uses a nested family of SDP relax- 


d 


Function 
i) 
° 
Function 


2.5 


Fig. 5.7 Ratio between two bivariate posynomials before (left graph) and after (right 
graph) the log transformation. It is a nonconvex function in both cases. 
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ations to solve constrained polynomial minimization problems as explained 
in the last section. 


5.3.38 Power Control by Geometric Programming: 
Convex Case 


Various schemes for power control, centralized or distributed, have been ex- 
tensively studied since the 1990s based on different transmission models and 
application needs (e.g., in [2, 26, 47, 55, 63, 72]). This section summarizes 
the new approach of formulating power control problems through GP. The 
key advantage is that globally optimal power allocations can be efficiently 
computed for a variety of nonlinear systemwide objectives and user QoS 
constraints, even when these nonlinear problems appear to be nonconvex 
optimization. 


Basic Model 


Consider a wireless (cellular or multihop) network with n logical transmit- 
ter/receiver pairs. Transmit powers are denoted as P;,..., P,,. In the cellular 
uplink case, all logical receivers may reside in the same physical receiver, 
that is, the base station. In the multihop case, because the transmission en- 
vironment can be different on the links comprising an end-to-end path, power 
control schemes must consider each link along a flow’s path. 
Under Rayleigh fading, the power received from transmitter 7 at receiver 
i is given by G,;Fi;P; where G;; > 0 represents the path gain (it may also 
encompass antenna gain and coding gain) that is often modeled as propor- 
tional to di, where d;; denotes distance, 7 is the power fall-off factor, and 
F,; model Rayleigh fading and are independent and exponentially distributed 
with unit mean. The distribution of the received power from transmitter j 
at receiver 7 is then exponential with mean value E [Gi F;P;] = Gi; P). The 
SIR for the receiver on logical link 7 is: 
SIR; = pee ( 5.1 5) 
dogei PIGig Fig + 


where 7; is the noise power for receiver 7. 

The constellation size M used by a link can be closely approximated for 
MQAM modulations as follows. M = 1+(—@,/ (In(¢2BER))) SIR, where BER 
is the bit error rate and ¢1, ¢2 are constants that depend on the modulation 
type. Defining K = —¢,/ (In(¢2BER)) leads to an expression of the data rate 
R; on the ith link as a function of the SIR: R; = (1/T) log,(1+ KSIR;), which 
can be approximated as 
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when KSIR is much larger than 1. This approximation is reasonable either 
when the signal level is much higher than the interference level or, in CDMA 
systems, when the spreading gain is large. For notational simplicity in the 
rest of this section, we redefine G';; as K times the original G';;, thus absorbing 
constant K into the definition of SIR. 

The aggregate data rate for the system can then be written as 


Rsystem = > Ri = = log, TI sk F 


So in the high SIR regime, aggregate data rate maximization is equivalent 
to maximizing a product of SIR. The system throughput is the aggregate 
data rate supportable by the system given a set of users with specified QoS 
requirements. 

Outage probability is another important QoS parameter for reliable com- 
munication in wireless networks. A channel outage is declared and packets 
lost when the received SIR falls below a given threshold SIRiy, often com- 
puted from the BER requirement. Most systems are interference-dominated 
and the thermal noise is relatively small, thus the ith link outage probability 
is 


P,, = Prob{SIR; < SIRin} 
= Prob{G;,Fi,P; < SlRen > Gy Fig Pj}- 
i#i 
The outage probability can be expressed as [38] 
1 
P,;,=1- Il ee 
i SIRinGij Pj ’ 
jai + “Ga 


which means that the upper bound P,; < Po.imax Can be written as an upper 
bound on a posynomial in P: 


SIRinGi,P; 1 
1+ —————— } < ———_.. wb 
II Bs Gul; ~ 1- Pons . y 


Cellular Wireless Networks 


We first present how GP-based power control applies to cellular wireless 
networks with one-hop transmission from N users to a base station. These 
results extend the scope of power control by the classical solution in COMA 


5 Nonconvex Optimization for Communication Networks 161 


systems that equalizes SIRs, and those by the iterative algorithms (e.g., in 
[2, 26, 47]) that minimize total power (a linear objective function) subject to 
SIR constraints. 

We start the discussion on the suite of power control problem formula- 
tions with a simple objective function and basic constraints. The following 
constrained problem of maximizing the SIR of a particular user i* is a GP. 


maximize R;«(P) 

subject to Ry (P) = Ri min, Vi, 
PGi = Pi2Gia, 
0 < P; < Pinas; Vi. 


The first constraint, equivalent to SIR; > SIR; min, sets a floor on the SIR 
of other users and protects these users from user 7* increasing her transmit 
power excessively. The second constraint reflects the classical power control 
criterion in solving the near-far problem in CDMA systems: the expected 
received power from one transmitter 71 must equal that from another 72. The 
third constraint is regulatory or system limitations on transmit powers. All 
constraints can be verified to be inequality upper bounds on posynomials in 
transmit power vector P. 

Alternatively, we can use GP to maximize the minimum rate among all 
users. The maxmin fairness objective 


maximizep min{R;} 
4a 


can be accommodated in GP-based power control because it can be turned 
into equivalently maximizing an auxiliary variable t such that SIR;(P) > 
exp(t), Vi, which has a posynomial objective and constraints in (P, t). 


Example 5.5. A small illustrative example. A simple system comprised of 
five users is used for a numerical example. The five users are spaced at dis- 
tances d of 1,5, 10,15, and 20 units from the base station. The power fall-off 
factor y = 4. Each user has a maximum power constraint of Pnax = 0.5 mW. 
The noise power is 0.5 wW for all users. The SIR of all users, other than 
the user we are optimizing for, must be greater than a common threshold 
SIR level (. In different experiments, ( is varied to observe the effect on the 
optimized user’s SIR. This is done independently for the near user at d = 1, 
a medium distance user at d = 15, and the far user at d = 20. The results 
are plotted in Figure 5.8. 

Several interesting effects are illustrated. First, when the required thresh- 
old SIR in the constraints is sufficiently high, there is no feasible power con- 
trol solution. At moderate threshold SIR, as 8 is decreased, the optimized 
SIR initially increases rapidly. This is because it is allowed to increase its 
own power by the sum of the power reductions in the four other users, and 
the noise is relatively insignificant. At low threshold SIR, the noise becomes 
more significant and the power tradeoff from the other users less significant, 
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Fig. 5.8 Constrained optimization of power control in a cellular network (Example 5.5). 


so the curve starts to bend over. Eventually, the optimized user reaches its 
upper bound on power and cannot utilize the excess power allowed by the 
lower threshold SIR for other users. This is exhibited by the transition from 
a sharp bend in the curve to a much shallower sloped curve. 

We now proceed to show that GP can also be applied to the problem 
formulations with an overall system objective of total system throughput, 
under both user data rate constraints and outage probability constraints. 

The following constrained problem of maximizing system throughput is a 


GP. 
maximize Rsystem(P) 


subject to Ri(P) > Rimin, Vi, 
Poi (P) < Pci vases Vi, 
0 < P; < Py mnaxts Vi 


(5.18) 


where the optimization variables are the transmit powers P. The objective 
is equivalent to minimizing the posynomial [],;ISR;, where ISR is 1/SIR. 
Each ISR is a posynomial in P and the product of posynomials is again a 
posynomial. The first constraint is from the data rate demand R; min by each 
user. The second constraint represents the outage probability upper bounds 
Psi.max. These inequality constraints put upper bounds on posynomials of 
P, as can be readily verified through (5.16) and (5.17). Thus (5.18) is indeed 
a GP, and efficiently solvable for global optimality. 

There are several obvious variations of problem (5.18) that can be solved 
by GP; for example, we can lower bound Reystem a8 a constraint and maximize 
Rj» for a particular user 1*, or have a total power }°, P; constraint or objective 
function. 
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Table 5.3 Suite of power control optimization solvable by GP 


Objective Function 


A) Max R;* a) R; = Ri min 
specific user) rate constraint) 
) 


Max min; R; b) PirGir = Pi2Gi2 
worst-case user) near-—far constraint) 
C) Max i Ri c) dv: Ri = Rsystem,min 


B 


total throughput) sum rate constraint) 

D) Max bee wiR; d) Py < P54 max 

weighted rate sum) outage probability constraint) 
E) Min ¥O; Pi e)0< Pi < Pimax 

total power) power constraint) 


The objective function to be maximized can also be generalized to a 
weighted sum of data rates, 7, wii, where w = 0 is a given weight vector. 
This is still a GP because maximizing }°, w;log SIR; is equivalent to maxi- 
mizing log [], SIR}, which is in turn equivalent to minimizing [], ISR;’'. Now 
use auxiliary variables {t;}, and minimize |], t;" over the original constraints 
in (5.18) plus the additional constraints ISR; < t; for all 7. This is readily 
verified to be a GP in (x,t), and is equivalent to the original problem. 


Generalizing the above discussions and observing that high-SIR assump- 
tion is needed for GP formulation only when there are sums of log(1 + SIR) 
in the optimization problem, we have the following summary. 


Proposition 5.1. In the high-SIR regime, any combination of objectives 
(A)-(E) and constraints (a)—(e) in Table 5.3 (pick any one of the objectives 
and any subset of the constraints) is a power control optimization problem 
that can be solved by GP, that is, can be transformed into a convex optimiza- 
tion with efficient algorithms to compute the globally optimal power vector. 
When objectives (C)-(D) or constraints (c)-(d) do not appear, the power 
control optimization problem can be solved by GP in any SIR regime. 


In addition to efficient computation of the globally optimal power allo- 
cation with nonlinear objectives and constraints, GP can also be used for 
admission control based on feasibility study described in [11], and for deter- 
mining which QoS constraint is a performance bottleneck, that is, met tightly 
at the optimal power allocation.? 


3 This is because most GP solution algorithms solve both the primal GP and its Lagrange 
dual problem, and by the complementary slackness condition, a resource constraint is tight 
at optimal power allocation when the corresponding optimal dual variable is nonzero. 
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Extensions 


In wireless multihop networks, system throughput may be measured either by 
end-to-end transport layer utilities or by link layer aggregate throughput. GP 
application to the first approach has appeared in [10], and those to the second 
approach in [11]. Furthermore, delay and buffer overflow properties can also 
be accommodated in the constraints or objective function of GP-based power 
control. 


5.3.4 Power Control by Geometric Programming: 
Nonconvez Case 


If we maximize the total throughput Rsystem in the medium to low SIR case 
(i.e., when SIR is not much larger than 0 dB), the approximation of log(1 + 
SIR) as log SIR does not hold. Unlike SIR, which is an inverted posynomial, 
1+SIR is not an inverted posynomial. Instead, 1/ (1 + SIR) is a ratio between 
two posynomials: 


f®) = ei Gig Pj 1 (5.19) 
GP) Oj Gig Pi tri 
Minimizing, or upper bounding, a ratio between two posynomials be- 
longs to a truly nonconvex class of problems known as complementary GP 
[1, 11] that is in general an NP-hard problem. An equivalent generalization 
of GP is signomial programming [1, 11]: minimizing a signomial subject to 
upper bound inequality constraints on signomials, where a signomial s(x) 
is a sum of monomials, possibly with negative multiplicative coefficients: 
s(x) = oy cigi(x) where c € R™ and g;(x) are monomials.4 


Successive Convex Approximation Method 


Consider the following nonconvex problem, 


minimize f(x) 
subject to fi(x) <1, ¢=1,2,...,m, (5.20) 
where fo is convex without loss of generality,° but the f;(x)s, Vi are noncon- 
vex. Because directly solving this problem is NP-hard, we want to solve it by 


4 An SP can always be converted into a complementary GP, because an inequality in SP, 
which can be written as fii(x) — fi2(x) <1, where fi1, fig are posynomials, is equivalent 
to an inequality fi1(x)/ (1+ fi2(x)) < 1 in complementary GP. 

5 If fo is nonconvex, we can move the objective function to the constraint by introducing 
auxiliary scalar variable t and writing minimize t subject to the additional constraint 
fo(x) -# <0. 
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a series of approximations f;(x) © f;(x),Vx, each of which can be optimally 
solved in an easy way. It is known [46] that if the approximations satisfy the 
following three properties, then the solutions of this series of approximations 
converge to a point satisfying the necessary optimality Karush—Kuhn—Tucker 
(KKT) conditions of the original problem. 


(1) fi(x) < fi(x) for all x. 
(2) fi(xo) = fi(xo) where xo is the optimal solution of the approximated 
problem in the previous iteration. 


(3) Vfi(xo) = Vfi(xo). 


The following algorithm describes the generic successive approximation 
approach. Given a method to approximate f;(x) with f;(x) , Vi, around some 
point of interest xp, the following algorithm provides the output of a vector 
that satisfies the KKT conditions of the original problem. 


Algorithm 2. Successive approximation to a nonconvex problem. 


1. Choose an initial feasible point x and set k = 1. 
2. Form an approximated problem of (5.20) based on the previous point 
(k-1) 
x : 
3. Solve the kth approximated problem to obtain x“). 
4. Increment / and go to step 2 until convergence to a stationary point. 


Single condensation method. Complementary GPs involve upper bounds 
on the ratio of posynomials as in (5.19); they can be turned into GPs by 
approximating the denominator of the ratio of posynomials, g(x), with a 
monomial g(x), but leaving the numerator f(x) as a posynomial. 

The following basic result can be readily proved using the arithmetic- 
mean—geometric-mean inequality. 


Lemma 5.1. Let g(x) = 0, ui(x) be @ posynomial. Then 


‘ wi(x) \™ 

> = —— . 5.21 

oe 2 a0) = T] (“) (5.21) 
If, in addition, a; = u;(x0)/g(xo0), Vi, for any fixed positive xo, then g(xo) = 
g(Xo), and g(x) is the best local monomial approximation to g(x) near xo in 
the sense of first-order Taylor approximation. 


The above lemma easily leads to the following 


Proposition 5.2. The approximation of a ratio of posynomials f(x)/g(x) 
with f(x)/g(x) where g(x) is the monomial approximation of g(x) using the 
arithmetic-geometric mean approximation of Lemma 5.1 satisfies the three 
conditions for the convergence of the successive approximation method. 


Double condensation method. Another choice of approximation is to make 
a double monomial approximation for both the denominator and numerator 
in (5.19). However, in order to satisfy the three conditions for the convergence 
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of the successive approximation method, a monomial approximation for the 
numerator f(x) should satisfy f(x) < f(x). 


Applications to Power Control 


Figure 5.9 shows a block diagram of the approach of GP-based power control 
for a general SIR regime [64]. In the high SIR regime, we need to solve only 
one GP. In the medium to low SIR regimes, we solve truly nonconvex power 
control problems that cannot be turned into convex formulation through a 
series of GPs. 


Original Solve 
(High SIR) | Problem 1 GP 


ina Complementary Solve 
be GP (Condensed hs 1 GP 
Low SIR) Problem (Condensed) 


Fig. 5.9 GP-based power control in different SIR regimes. 


GP-based power control problems in the medium to low SIR regimes be- 
come SP (or, equivalently, complementary GP), which can be solved by the 
single or double condensation method. We focus on the single condensation 
method here. Consider a representative problem formulation of maximizing 
total system throughput in a cellular wireless network subject to user rate 
and outage probability constraints in problem (5.18), which can be explicitly 
written out as 


N 1 


minimize dee T+SIR,; 


subject to (27 %imin — Yar <1, t= Tyeu.gN, (5.22) 
= : N GijP3 —_ : 
(SiRen)™ = _ Ps tpn) jx GuiPi < 1, i= 1, Sealy NY, 


PAP ee) < 1, i= 1, be .,N. 


All the constraints are posynomials. However, the objective is not a posyno- 
mial, but a ratio between two posynomials as in (5.19). This power control 
problem can be solved by the condensation method by solving a series of 
GPs. Specifically, we have the following single-condensation algorithm. 


Algorithm 3. Single condensation GP power control. 


1. Evaluate the denominator posynomial of the objective function in (5.22) 
with the given P. 
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Fig. 5.10 Maximized total system throughput achieved by the (single) condensation 
method for 500 different initial feasible vectors (Example 5.6). Each point represents a 
different experiment with a different initial power vector. 


2. Compute for each term i in this posynomial, 


value of ith term in posynomial 


4 


value of posynomial 


3. Condense the denominator posynomial of the (5.22) objective function 
into a monomial using (5.21) with weights a;. 

4. Solve the resulting GP using an interior point method. 

5. Go to step 1 using P of step 4. 

6. Terminate the kth loop if ||P _ pee) < € where ¢€ is the error 
tolerance for exit condition. 


As condensing the objective in the above problem gives us an underesti- 
mate of the objective value, each GP in the condensation iteration loop tries 
to improve the accuracy of the approximation to a particular minimum in 
the original feasible region. All three conditions for convergence are satisfied, 
and the algorithm is convergent. Empirically through extensive numerical ex- 
periments, we observe that it almost always computes the globally optimal 
power allocation. 


Example 5.6. Single condensation example. We consider a wireless cellular 
network with three users. Let T = 10~° s, Gj; = 1.5, and generate Gi;,i 4 j, 
as independent random variables uniformly distributed between 0 and 0.3. 
Threshold SIR is SIR4, = —10 dB, and minimal data rate requirements are 
100 kbps, 600 kbps, and 1000 kbps for logical links 1, 2, and 3, respectively. 
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Maximal outage probabilities are 0.01 for all links, and maximal transmit 
powers are 3 mW, 4 mW, and 5 mW for links 1, 2, and 3, respectively. For 
each instance of problem (5.22), we pick a random initial feasible power vec- 
tor P uniformly between 0 and Pyax. Figure 5.10 compares the maximized 
total network throughput achieved over 500 sets of experiments with differ- 
ent initial vectors. With the (single) condensation method, SP converges to 
different optima over the entire set of experiments, achieving (or coming very 
close to) the global optimum at 5290 bps 96% of the time and a local op- 
timum at 5060 bps 4% of the time. The average number of GP iterations 
required by the condensation method over the same set of experiments is 15 
if an extremely tight exit condition is picked for SP condensation iteration: 
¢=1x10~'°. This average can be substantially reduced by using a larger €; 
for example, increasing € to 1 x 10~? requires on average only 4 GPs. 


We have thus far discussed a power control problem (5.22) where the 
objective function needs to be condensed. The method is also applicable if 
some constraint functions are signomials and need to be condensed [14, 35]. 


5.3.5 Distributed Algorithm 


A limitation for GP-based power control in ad hoc networks (without base 
stations) is the need for centralized computation (e.g., by interior point meth- 
ods). The GP formulations of power control problems can also be solved by 
a new method of distributed algorithm for GP. The basic idea is that each 
user solves its own local optimization problem and the coupling among users 
is taken care of by message passing among the users. Interestingly, the spe- 
cial structure of coupling for the problem at hand (all coupling among the 
logical links can be lumped together using interference terms) allows one to 
further reduce the amount of message passing among the users. Specifically, 
we use a dual decomposition method to decompose a GP into smaller sub- 
problems whose solutions are jointly and iteratively coordinated by the use 
of dual variables. The key step is to introduce auxiliary variables and to add 
extra equality constraints, thus transferring the coupling in the objective to 
coupling in the constraints, which can be solved by introducing “consistency 
pricing” (in contrast to “congestion pricing” ). We illustrate this idea through 
an unconstrained GP followed by an application of the technique to power 
control. 


Distributed Algorithm for GP 


Suppose we have the following unconstrained standard form GP in x > 0, 


minimize oF fi (xi, {ty ery (5.23) 
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where x; denotes the local variable of the ith user, {v;}je7(;) denotes the 
coupled variables from other users, and f; is either a monomial or posynomial. 
Making a change of variable y; = log x;, Vi, in the original problem, we obtain 


minimize 7, file, {e% }jer())- 


We now rewrite the problem by introducing auxiliary variables y;; for the 
coupled arguments and additional equality constraints to enforce consistency: 


minimize )°, file”, {e" }jera) (5.24) 
subject to yi; = yj, V7 € I(t), Vi. ‘ 


Each ith user controls the local variables (yj, {yij}jer(i)). Next, the Lagran- 
gian of (5.24) is formed as 


Lifaeh ahi iva) = Dail (ie gee Laer i) )+>— >. Vag Ys =H) 


i jEl(i) 


= do Lily, {vis} (75)}), 
where 


Li (yes {yighi trig}) = File Le" bier) + ( > vs) — >) vay. 
jeI(j) jET(i) 
(5.25) 

The minimization of the Lagrangian with respect to the primal variables 
({yi}, {yij}) can be done simultaneously and distributively by each user in 
parallel. In the more general case where the original problem (5.23) is con- 
strained, the additional constraints can be included in the minimization at 
each [. 

In addition, the following master Lagrange dual problem has to be solved 
to obtain the optimal dual variables or consistency prices {7;;}, 


_ 9115 }), (5.26) 


{Vii 


where 


gigi }) = | ymin, Lily tyia hs {Vis }). 


Note that the transformed saeal problem (5.24) is convex with zero duality 
gap; hence the Lagrange dual problem indeed solves the original standard 
GP problem. A simple way to solve the maximization in (5.26) is with the 
following subgradient update for the consistency prices, 


yig(t +1) = vig (t) + O()(y; (€) — yiz (2). (5.27) 
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Appropriate choice of the stepsize 6(t) > 0, for example, 6(t) = 9 /t for some 
constant d9 > 0, leads to convergence of the dual algorithm. 

Summarizing, the ith user has to: (i) minimize the function L; in (5.25) 
involving only local variables, upon receiving the updated dual variables 
{yji,j 14 € I(g)}, and (ii) update the local consistency prices {y:;, 7 € I(i)} 
with (5.27), and broadcast the updated prices to the coupled users. 


Applications to Power Control 


As an illustrative example, we maximize the total system throughput in the 
high SIR regime with constraints local to each user. If we directly applied 
the distributed approach described in the last section, the resulting algorithm 
would require knowledge by each user of the interfering channels and inter- 
fering transmit powers, which would translate into a large amount of message 
passing. To obtain a practical distributed solution, we can leverage the struc- 
tures of power control problems at ad and instead keep a local copy of 
each of the effective received powers P/} = G,;P;. Again using problem (5.18) 
as an example formulation and ee high SIR, we can write the problem 
as (after the log change of variable) 


minimize Di log (Gq exp(— Py) (Xjacexr(PF ee )) (5.28) 
subject to P? = Gi; + P;, 


Constraints are local to each user, for example, (a), (d), and (e) in Table 5.3. 
The partial Lagrangian is 


L= S> log ‘ exp(— i>. exp(P, 


jFt 


+) " ij — 7 +P; as (5.29) 


a ft 


and the local ith Lagrangian function in (5.29) is distributed to the ith user, 
from which the dual decomposition method can be used to determine the op- 
timal power allocation P*. The distributed power control algorithm is sum- 
marized as follows. 


Algorithm 4. Distributed power allocation update to maximize Rgystem. 


At each iteration t: 

1. The ith user receives the term oe ei Vit (i)) involving the dual vari- 
ables from the interfering users by message-passing and minimizes the fol- 
lowing local Lagrangian with respect to P,(t), {PRO} _ subject to the local 

j 


constraints. 
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= log | Gi’ exp(—P;(t)) | S© exp(PF(t) + 0? 


2. The ith user estimates the effective received power from each of the 
interfering users P;}(t) = Gi;P;(t) for 7 #7, updates the dual variable by 


vig (E+ 1) = ig (t) + (60/4) (PRO) - log GisPi(), (5.30) 


and then broadcasts them by message passing to all interfering users in the 
system. 


Example 5.7. Distributed GP power control. We apply the distributed algo- 
rithm to solve the above power control problem for three logical links with 
Gi; =0.2,1 47, Gy = 1, Vi, maximal transmit powers of 6 mW, 7 mW, and 
7 mW for links 1, 2, and 3 respectively. Figure 5.11 shows the convergence of 
the dual objective function towards the globally optimal total throughput of 
the network. Figure 5.12 shows the convergence of the two auxiliary variables 
in links 1 and 3 towards the optimal solutions. 


oe x 10° . Dual aneaNe function 
2 
1.8} 
1.6} 
1.44 
1.27 
ab 
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Iteration 


Fig. 5.11 Convergence of the dual objective function through distributed algorithm (Ex- 
ample 5.7). 
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Fig. 5.12 Convergence of the consistency constraints through distributed algorithm (Ex- 
ample 5.7). 


5.3.6 Concluding Remarks and Future Directions 


Power control problems with nonlinear objective and constraints may seem 
to be difficult, NP-hard problems to solve for global optimality. However, 
when SIR is much larger than 0 dB, GP can be used to turn these prob- 
lems into intrinsically tractable convex formulations, accommodating a vari- 
ety of possible combinations of objective and constraint functions involving 
data rate, delay, and outage probability. Then interior point algorithms can 
efficiently compute the globally optimal power allocation even for a large 
network. Feasibility analysis of GP naturally leads to admission control and 
pricing schemes. When the high SIR approximation cannot be made, these 
power control problems become SP and may be solved by the heuristic of 
the condensation method through a series of GPs. Distributed optimal algo- 
rithms for GP-based power control in multihop networks can also be carried 
out through message passing. 

Several challenging research issues in the low-SIR regime remain to be 
further explored. These include, for example, the reduction of SP solution 
complexity (e.g., by using high-SIR approximation to obtain the initial power 
vector and by solving the series of GPs only approximately except the last 
GP), and the combination of SP solution and distributed algorithm for dis- 
tributed power control in low SIR regime. We also note that other approaches 
to tackle nonconvex power control problems have been studied, for example, 
the use of a particular utility function of rate to turn the problem into a 
convex one [28]. 
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5.4 DSL Spectrum Management 


5.4.1 Introduction 


Digital subscriber line (DSL) technologies transform traditional voice-band 
copper channels into high-bandwidth data pipes, which are currently capable 
of delivering data rates up to several Mbps per twisted-pair over a distance 
of about 10 kft. The major obstacle for performance improvement in today’s 
DSL systems (e.g., ADSL and VDSL) is crosstalk, which is the interference 
generated between different lines in the same binder. The crosstalk is typically 
10-20 dB larger than the background noise, and direct crosstalk cancellation 
(e.g., [6, 27]) may not be feasible in many cases due to complexity issues or 
as a result of unbundling. To mitigate the detriments caused by crosstalk, 
static spectrum management which mandates spectrum mask or flat power 
backoff across all frequencies (i.e., tones) has been implemented in the current 
system. 

Dynamic spectrum management (DSM) techniques, on the other hand, 
can significantly improve data rates over the current practice of static spec- 
trum management. Within the current capability of the DSL modems, each 
modem has the capability to shape its own power spectrum density (PSD) 
across different tones, but can only treat crosstalk as background noise (i.e., 
no signal level coordination, such as vector transmission or iterative decod- 
ing, is allowed), and each modem is inherently a single-input—single-output 
communication system. The objective would be to optimize the PSD of all 
users on all tones (i.e., continuous power loading or discrete bit loading), such 
that they are “compatible” with each other and the system performance (e.g., 
weighted rate sum as discussed below) is maximized. 

Compared to power control in wireless networks treated in the last sec- 
tion, the channel gains are not time-varying in DSL systems, but the prob- 
lem dimension increases tremendously because there are many “tones” (or 
frequency carriers) over which transmission takes place. Nonconvexity still 
remains a major technical challenge, and high SIR approximation in gen- 
eral cannot be made. However, utilizing the specific structures of the prob- 
lem (e.g., the interference channel gain values), an efficient and distributed 
heuristic is shown to perform close to the optimum in many realistic DSL 
network scenarios. 

Following [7], this section presents a new algorithm for spectrum manage- 
ment in frequency selective interference channels for DSL, called autonomous 
spectrum balancing (ASB). It is the first DSL spectrum management algo- 
rithm that satisfies all of the following requirements for performance and 
complexity. It is autonomous (distributed algorithm across the users without 
explicit information exchange) with linear-complexity, while provably conver- 
gent, and comes close to the globally optimal rate region in practice. ASB 
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overcomes the bottlenecks in the state-of-the-art algorithms in DSM, includ- 
ing IW, OSB, and ISB summarized below. 

Let K be the number of tones and N the number of users (lines). The 
iterative waterfilling (IW) algorithm [74] is among one of the first DSM algo- 
rithms proposed. In IW, each user views any crosstalk experienced as additive 
Gaussian noise, and seeks to maximize its data rate by “waterfilling” over the 
aggregated noise plus interference. No information exchange is needed among 
users, and all the actions are completely autonomous. IW leads to a great 
performance increase over the static approach, and enjoys a low complexity 
that is linear in NV. However, the greedy nature of IW leads to a performance 
far from optimal in the near—far scenarios such as mixed CO/RT deployment 
and upstream VDSL. 

To address this, an optimal spectrum balancing (OSB) algorithm [9] has 
been proposed, which finds the best possible spectrum management solution 
under the current capabilities of the DSL modems. OSB avoids the selfish be- 
haviors of individual users by aiming at the maximization of a total weighted 
sum of user rates, which corresponds to a boundary point of the achievable 
rate region. On the other hand, OSB has a high computational complex- 
ity that is exponential in N, which quickly leads to intractability when N 
is larger than 6. Moreover, it is a completely centralized algorithm where a 
spectrum management center at the central office needs to know the global 
information (i.e., all the noise PSDs and crosstalk channel gains in the same 
binder) to perform the algorithm. 

As an improvement to the OSB algorithm, an iterative spectrum balancing 
(ISB) algorithm [8] has been proposed, which is based on a weighted sum 
rate maximization similar to OSB. Different from OSB, ISB performs the 
optimization iteratively through users, which leads to a quadratic complexity 
in N. Close to optimal performance can be achieved by the ISB algorithm 
in most cases. However, each user still needs to know the global information 
as in OSB, thus ISB is still a centralized algorithm and is considered to be 
impractical in many cases. 

This section presents the ASB algorithm [7], which attains near-optimal 
performance in an implementable way. The basic idea is to use the concept 
of a reference line to mimic a “typical” victim line in the current binder. 
By setting the power spectrum level to protect the reference line, a good 
balance between selfish and global maximizations can be achieved. The ASB 
algorithm enjoys a linear complexity in N and Kk, and can be implemented 
in a completely autonomous way. We prove the convergence of ASB under 
both sequential and parallel updates. 

Table 5.4 compares various aspects of different DSM algorithms. Utilizing 
the structures of the DSL problem, in particular, the lack of channel variation 
and user mobility, is the key to provide a linear complexity, distributed, con- 
vergent, and almost optimal solution to this coupled, nonconvex optimization 
problem. 
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Table 5.4 Comparison of different DSM algorithms 


Algorithm 
WwW [Autonomous[O (KN) — [Suboptimal _[[7d] 


ASB [Autonomous N) 


5.4.2 System Model 


Using the notation as in [9, 8], we consider a DSL bundle with V = {1,..., N} 
modems (i.e., lines, users) and K = {1,...,&} tones. Assume discrete mul- 
titone (DMT) modulation is employed by all modems; transmission can be 
modeled independently on each tone as 


Yr = Hyxy + Zp. 


The vector x, = {x,n € N} contains transmitted signals on tone k, where 
xy is the signal transmitted onto line n at tone k. yx, and zz have similar 
structures. y; is the vector of received signals on tone k. zz is the vector of 
additive noise on tone & and contains thermal noise, alien crosstalk, single- 
carrier modems, radio frequency interference, and so on. Hy, = [Pe lam en is 
the N x N channel transfer matrix on tone k, where h,””” is the channel from 
TX m to RX n on tone k. The diagonal elements of H; contain the direct- 
channels whereas the off-diagonal elements contain the crosstalk channels. We 


denote the transmit power spectrum density (PSD) s? = € {lepP}. In the 


last section’s notation for single-carrier systems, we would have s7? = P,,, Vk. 
For convenience we denote the vector containing the PSD of user n on all 
tones as s” = {s',k € K}. We denote the DMT symbol rate as fs. 

Assume that each modem treats interference from other modems as noise. 
When the number of interfering modems is large, the interference can be 
well approximated by a Gaussian distribution. Under this assumption the 
achievable bit loading of user n on tone k is 


1 Sh 
be = log| 1 - 5.31 
( a = ==) | po 
where a," = arr / [nen]? is the normalized crosstalk channel gain, and 


o} is the noise power density normalized by the direct channel gain Saale 
Here I’ denotes the SINR-gap to capacity, which is a function of the de- 
sired BER, coding gain, and noise margin [61]. Without loss of generality, we 


assume J’ = 1. The data rate on line n is thus 
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ae (5.32) 


kek 


Each modem n is typically subject to a total power constraint P”, due to the 
limitations on each modem’s analogue frontend. 


Le ew (5.33) 
kek 


5.4.3 Spectrum Management Problem Formulation 


One way to define the spectrum management problem is start with the fol- 
lowing optimization problem. 


maximize R} 
subject to R” > R™tarset Yin > 1 (5.34) 
sce Se = Pn. 


Here R” 8° is the target rate constraint of user n. In other words, we 
try to maximize the achievable rate of user 1, under the condition that all 
other users achieve their target rates R”'**8*t, The mutual interference in 
(5.31) causes Problem (5.34) to be coupled across users on each tone, and 
the individual total power constraint causes Problem (5.34) to be coupled 
across tones as well. 

Moreover, the objective function in Problem (5.34) is nonconvex due to 
the coupling of interference, and the convexity of the rate region cannot be 
guaranteed in general. 

However, it has been shown in [75] that the duality gap between the dual 
gap of Problem (5.34) goes to zero when the number of tones K gets large 
(e.g., for VDSL), thus Problem (5.34) can be solved by the dual decomposition 
method, which brings the complexity as a function of K down to linear. 
Moreover, a frequency-sharing property ensures the rate region is convex 
with large enough K, and each boundary point of the boundary point of the 
rate region can be achieved by a weighted rate maximization as (following 
[9]), 

maximize R! + >>.) w"R” 


subject to pen 8¢ <P”, Wn EN, 93) 


such that the nonnegative weight coefficient w” is adjusted to ensure that the 
target rate constraint of user n is met. Without loss of generality, here we 
define w! = 1. By changing the rate constraints R™*** for users n > 1 (or 
equivalently, changing the weight coefficients, w” for n > 1), every boundary 
point of the convex rate region can be traced. 
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We observe that at the optimal solutions of (5.34), each user chooses a 
PSD level that leads to a good balance of maximization of its own rate 
and minimization of the damages it causes to the other users. To accurately 
calculate the latter, the user needs to know the global information of the noise 
PSDs and crosstalk channel gains. However, if we aim at a less aggressive 
objective and only require each user give enough protection to the other 
users in the binder while maximizing her own rate, then global information 
may not be needed. Indeed, we can introduce the concept of a “reference 
line”, a virtual line that represents a “typical” victim in the current binder. 
Then instead of solving (5.34), each user tries to maximize the achievable 
data rate on the reference line, subject to its own data rate and total power 
constraint. Define the rate of the reference line to user n as 


m,ref __ In __ 5k 


kek kek 


The coefficients {8%,0%,2,Vk,n} are parameters of the reference line and 
can be obtained from field measurement. They represent the conditions of a 
“typical” victim user in an interference channel (here a binder of DSL lines), 
and are known to the users a priori. They can be further updated on a much 
slower time scale through channel measurement data. User n then wants to 
solve the following problem local to itself, 


maximize RR” 
subject to R" > R™ treet, (5.36) 


inex Se SP”. 


By using Lagrangian relaxation on the rate target constraint in Problem 
(5.36) with a weight coefficient (dual variable) w”, the relaxed version of 
(5.36) is 

maximize w”R” + Rrref 


subject to Vipen sp < PP". en) 


The weight coefficient w” needs to be adjusted to enforce the rate constraint. 


5.4.4 ASB Algorithms 


We first introduce the basic version of the ASB algorithm (ASB-I), where each 
user n chooses the PSD s” to solve (5.36) , and updates the weight coefficient 
w” to enforce the target rate constraint. Then we introduce a variation of the 
ASB algorithm (ASB-II) that enjoys even lower computational complexity 
and provable convergence. 
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ASB-I 


For each user n, replacing the original optimization (5.36) with the Lagrange 
dual problem 


A">0, > ee si<Ppn > —s Ji (w”, r” 13k ? Sh my ? (5.38) 
where ; 
Jk (w", A", sf, sp") = wh be + by — A” sf. (5.39) 


By introducing the dual variable \”, we decouple (5.36) into several smaller 
subproblems, one for each tone. And define J/’ as user n’s objective function 
on tone k. The optimal PSD that maximizes J;’ for given w” and AX" is 


ght (wi A",4, ") Sate aa Ni (Toller are ae ae (5.40) 


which can be found by solving the first-order condition, 
Ode (a Ae 8," ) (Ost =O, 
which leads to 


w” Un Sk 


E+ Linen Oe se + OR (Sq + GR + Ge) (Gps + Fe) 


A" = 0. 


(5.41) 
Note that (5.41) can be simplified into a cubic equation that has three 
solutions. The optimal PSD can be found by substituting these three solutions 
back to the objective function J? (w”, XA”, Si, ae”) as well as checking the 
boundary solutions s? = 0 and s? = P”, and picking the one that yields the 
largest value of J;’. 
The user then updates \” to enforce the power constraint, and updates 
w” to enforce the target rate constraint. The complete algorithm is given as 
follows, where €) and €y, are small stepsizes for updating A” and w”. 


Algorithm 5. Autonomous Spectrum Balancing. 


repeat 
for each user n = 1,...,N 
repeat 
for each tone k = 1,...,K, find 
st! — arg maxsr>0 Jp 
+ 
An = [WP +e, (De se — PP)] 
w" = [w™ + Ey (Rm tareet = ye py 
until convergence 
end 


until convergence 
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ASB-II with Frequency-Selective Waterfilling 


To obtain the optimal PSD in ASB-I (for fixed A” and w”), we have to solve 
the roots of a cubic equation. To reduce the computational complexity and 
gain more insights of the solution structure, we assume that the reference 
line operates in the high SIR regime whenever it is active: If 5, > 0, then 
5 > Gk >> aj,” st for any feasible s?, n € N, and k € K. This assumption 
is motivated by our observations on optimal solutions in the DSL type of 
interference channels. It means that the reference PSD is much larger than 
the reference noise, which is in turn much larger than the interference from 
user n. Then on any tone k € K = {k | & >0,k € K}, the reference line’s 
achievable rate is 


S Sk wy sy 
Oy, Sy, + Ok OK Ok 


and user n’s objective function on tone k can be approximated by 


nN en as 

yeibl NM YM gM gM) _ qyrpn An Sh Neg? +4] Sk 
k (w”, 55k SE ) =a ko = = Spo+ og ae A 
k 


ok 
The corresponding optimal PSD is 
+ 


n,II,l nmy\n on Wn n,m om n 
se (w" A", s = y aes — oO . (5.42) 
k ( , I°k ) \r ne k k k 
ay, / Oo 
+ i/ k meén 


This is a waterfilling type of solution and is intuitively satisfying: the PSD 
should be smaller when the power constraint is tighter (i.e., A, is larger), or 
the interference coefficient to the reference line a} is higher, or the noise level 
on the reference line 6, is smaller, or there is more interference plus noise 
eimén Of” Sy +7 on the current tone. It is different from the conventional 
waterfilling in that the water level in each tone is not only determined by the 
dual variables w" and A", but also by the parameters of the reference line, 
aR /Or- 

On the other hand, on any tone where the reference line is inactive, that 
is, k € K°={k | & =0,k € K}, the objective function is 


nt, lT,2 n n on .—n\ _,) npn non 
Jie (w”, A”, 5¢,5,,") = wbg — Ask, 
and the corresponding optimal PSD is 
n,II,2 nyn o—n\ _ 
se (w", A", 8") = 


eo Sag se —of| . (5.43) 


This is the same solution as the iterative waterfilling. 
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The choice of optimal PSD in ASB-II can be summarized as the following. 


+ 
m nym 7 
gmt (w” yr he = /—— a aes On Sh > of) ? ke K, 
k ’ 2°k = ve ar a 
(= Lngn nse — of), he Ke, 
(5.44) 


This is essentially a waterfilling type of solution, with different water levels 
for different tones (frequencies). We call it frequency selective waterfilling. 


5.4.5 Convergence Analysis 


In this section, we show the convergence for both ASB-I and ASB-I, for the 
case where users fix their weight coefficients w”, which is also called rate 
adaptive (RA) spectrum balancing [61] that aims at maximizing users’ rates 
subject to power constraint. 


Convergence in the Two-User Case 


The first result is on the convergence of ASB-I algorithm, with fixed w = 
(w!,w) and A= (A1,A?). 


Proposition 5.3. The ASB-I algorithm converges in a two-user case under 
fixed w and X, if users start from initial PSD values (s},,8;%) = (0, P?) or 
(sj, 82) = (P",0) on all tones. 


The proof of Proposition 5.3 uses supermodular game theory [65] and 
strategy transformation similar to [32]. 

Now consider the ASB-II algorithm where two users sequentially optimize 
their PSD levels under fixed values of w, but adjust A to enforce the power 
constraint. Denote g, as the PSD of user n in tone k after iteration t, where 
Mi i = P" is satisfied for any n and t. One iteration is defined as one 
round of updates of all users. We can show that 


Proposition 5.4. The ASB-II algorithm globally converges to the unique 
fixed point in a two-user system under fixed w, if maxp a” max, ne <A 


The convergence result of iterative waterfilling in the two-user case [74] is 
a special case of Proposition 5.4 by setting §, = 0, Vk. 

We further extend the convergence results to a system with an arbitrary 
N > 2 of users. We consider both sequential and parallel PSD updates of the 
users. In the more realistic but harder-to-analyze parallel updates, time is 
divided into slots, and each user n updates the PSD simultaneously in each 
time slot according to (5.44) based on the PSDs in the previous slot, where 
the A” is adjusted such that the power constraint is satisfied. 


5 Nonconvex Optimization for Communication Networks 181 


5Km 
User 1 Le a el CP 
User 2 eile RT | CP 
3 3.5Km 
User 3 ----------- RT CP 
4Km 3Km 


User 4 ---~-----4-4-~+ RT CP 


Fig. 5.13 An example of mixed CO/RT deployment topology (Example 5.8). 


Proposition 5.5. Assume maXpm,k Qj), < 1/(N —1); then the ASB-II al- 
gorithm globally converges (to the unique fixed point) in an N-user system 
under fixed w, with either sequential or parallel updates. 


Proposition 5.5 contains the convergence of iterative waterfilling in an N- 
user case with sequential updates (proved in [15]) as a special case of ASB 
convergence with sequential or parallel updates. Moreover, the convergence 
proof for the parallel updates turns out to be simpler than the one for sequen- 
tial updates. The proof extends that of Proposition 5.4, and can be found in 
[7]. 


5.4.6 Simulation Results 


Example 5.8. Mired CO-RT DSL. Here we summarize a typical numerical 
example comparing the performances of ASB algorithms with IW, OSB, and 
ISB. We consider a standard mixed central office (CO) and remote terminal 
(RT) deployment. A four-user scenario has been selected to make a compar- 
ison with the highly complex OSB algorithm possible. As depicted in Figure 
5.13 the scenario consists of one CO distributed line, and three RT distributed 
lines. The target rates on RT1 and RT2 have both been set to 2 Mbps. For 
a variety of different target rates on RT3, the CO line attempts to maximize 
its own data rate either by transmitting at full power in IW, or by setting its 
corresponding weight we. to unity in OSB, ISB, and ASB. 

This produces the rate regions shown in Figure 5.14, which shows that ASB 
achieves near optimal performance similar to OSB and ISB, and significant 
gain over IW even though both ASB and IW are autonomous. For example, 
with a target rate of 1 Mbps on CO, the rate on RT3 reaches 7.3 Mbps 
under the ASB algorithm, which is a 121% increase compared with the 3.3 
Mbps achieved by IW. We have also performed extensive simulations (more 
than 10,000 scenarios) with different CO and RT positions, line lengths, 
and reference line parameters. We found that the performance of ASB is 


182 Mung Chiang 


RT1 @2Mbps, RT2 @ 2 Mbps 


-+ Optimal Spectrum Balancing 

-& Iterative Spectrum Balancing 
1.8b + Autonomous Spectrum Balancing 
-& Iterative Waterfilling 
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Fig. 5.14 Rate regions obtained by ASB, IW, OSB, and ISB (Example 5.8). 


very insensitive to definition of the reference line: with a single choice of the 
reference line we observe good performance in a broad range of scenarios. 


5.4.7 Concluding Remarks and Future Directions 


Dynamic spectrum management techniques can greatly improve the perfor- 
mance of DSL lines by inducing cooperation among interfering users in the 
same binder. For example, the iterative waterfiling algorithm is a completely 
autonomous DSM algorithm with linear complexity in the number of users 
and number of tones, but the performance could be far from optimal in the 
mixed CO/RT deployment scenario. The optimal spectrum balancing and 
iterative spectrum balancing algorithms achieve optimal and close to opti- 
mal performances, respectively, but have high complexities in terms of the 
number of users and are completely centralized. 

This section surveys an autonomous dynamic spectrum management. al- 
gorithm called autonomous spectrum balancing. ASB utilizes the concept of 
“reference line”, which mimics a typical victim line in the binder. By setting 
the power spectrum level to protect the reference line, a good balance between 
selfish and global maximizations can be achieved. Compared with IW, OSB, 
and ISB, the ASB algorithm enjoys completely autonomous operations, low 
(linear) complexity in both the number of users and number of tones. Simu- 
lation shows that the ASB algorithm achieves close to optimal performance 
and is robust to the choice of reference line parameters. 
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We conclude this section by highlighting the key ideas behind ASB. The 
reference line represents the statistical average of all victims within a typical 
network, and can be thought as a “static pricing”. This differentiates the 
ASB algorithm with power control algorithms in the wireless setting, where 
pricing mechanisms have to be adaptive to the change of channel fading states 
and network topology, or Internet congestion control, where time-varying 
congestion pricing signals are used to align selfish interests for social welfare 
maximization. By using static pricing, no explicit message passing among the 
users is needed and the algorithm becomes autonomous across the users. This 
is possible because of the static nature of the channel gains in DSL networks. 

Mathematically, the surprisingly good rate region results by ASB means 
that the specific engineering problem structures in this nonconvex and cou- 
pled optimization problem can be leveraged to provide a very effective ap- 
proximation solution algorithm. Furthermore, robustness of the attained rate 
region with respect to perturbation of the reference line parameters has been 
verified to be very strong. This means that the dependence of the values of the 
local maxima of this nonconvex optimization problem on crosstalk channel 
coefficients is sufficiently insensitive for the observed robustness to hold. 

There are several exciting further directions to pursue with ASB, for exam- 
ple, convergence conditions for ASB-I, extensions to intercarrier-interference 
cases, and bounds on optimality gap that are empirically verified to be very 
small. Interactions of ASB with link layer scheduling have resulted in further 
improvement of throughput in DSL networks [33, 67]. 


5.5 Internet Routing 


5.5.1 Introduction 


Most large IP (Internet protocol) networks run interior gateway protocols 
(IGPs) such as OSPF (open shortest path first) or IS-IS (intermediate 
system—intermediate system) that select paths based on link weights. Routers 
use these protocols to exchange link weights and construct a complete view 
of the topology inside the autonomous system (AS). Then, each router com- 
putes shortest paths (where the length of a path is the sum of the weights on 
the links) and creates a table that controls the forwarding of each IP packet 
to the next hop in its route. To handle the presence of multiple shortest 
paths, in practice, a router typically splits traffic roughly evenly over each of 
the outgoing links along a shortest path to the destination. The link weights 
are typically configured by the network operators or automated management 
systems, through centralized computation, to satisfy traffic-engineering goals, 
such as minimizing the maximum link utilization or the sum of link cost [24]. 
Following common practice, we use the the sum of some increasing and convex 
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link cost functions as the primary comparison metric and the optimization 
objective in this section. 

Setting link weights under OSPF and IS-IS can be categorized as link- 
weight-based traffic engineering, where a set of link weights can uniquely and 
distributively determine the flow of traffic within the network for any given 
traffic matrix. The traffic matrix can be computed based on traffic measure- 
ments (e.g., [20]) or may represent explicit subscriptions or reservations from 
users. Link-weight-based traffic engineering has two key components: a cen- 
tralized approach for setting the routing parameters (i.e., link weights) and 
a distributed way of using these link weights to decide the routes to forward 
packets. Setting the routing parameters based on a networkwide view of the 
topology and traffic, rather than the local views at each router, can achieve 
better performance [22]. 

Evaluation of various traffic engineering schemes, in terms of total link cost 
minimization, can be made against the performance benchmark of optimal 
routing (OPT), which can direct traffic along any paths in any proportion. 
The formulation can be found, for example, in [70]. OPT models an idealized 
routing scheme that can establish one or more explicit paths between every 
pair of nodes, and distribute an arbitrary amount of traffic on each of the 
paths. 

It is easy to construct examples where OSPF, one of the most prevalent IP 
routing protocols today, with the best link weighting performs substantially 
(5000 times) worse than OPT in terms of minimizing sum of link cost. In ad- 
dition, finding the best link weights under OSPF is NP-hard [24]. Although 
the best OSPF link weights can be found by solving an integer linear pro- 
gram (ILP) formulation, such an approach is impractical even for a midsize 
network. Many heuristics, including local search [24] and simulated annealing 
[5, 18] have been proposed to search for the best link weights under OSPF. 
Among them, local-search technique is the most attractive method in finding 
a good setting of the link weights for large-scale networks. Even though OSPF 
with a good setting of the weights performs within a few percent of OPT for 
some practical scenarios [24, 18, 5], there are still many realistic situations 
where the performance gap between OSPF and OPT could be significant 
even at low utilization. 

There are two main reasons for the difficulty in tuning OSPF for good 
performance. First, the routing mechanism restricts the traffic to be routed 
only on shortest paths (and evenly split across shortest paths, an issue that 
has been addressed in [59]). Second, link weights and the traffic matrix are 
not integrated into the optimization formulation. 

Both bottlenecks are overcome in the distributed exponentially weighted 
flow splitting (DEFT) protocol developed in [70]: 


1. Traffic is allowed to be routed on nonshortest paths, with exponential 
penalty on path lengths. 
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2. An innovative optimization formulation is proposed, where both link 
weights and flows are variables. It leads to an effective two-stage itera- 
tive method. 


As a result, DEFT, discussed in this section, has the following desirable 
properties. 


e It determines a unique flow of traffic for a given link weight setting in 
polynomial time. 

e It is provably always better than OSPF in terms of minimizing the maxi- 
mum link utilization or the sum of link cost. 

e It is readily implemented as an extension to the existing IGP (e.g., OSPF). 

e The traffic engineering under DEFT with the two-stage iterative method 
realizes near-optimal flow of traffic even for large-scale network topologies. 

e The optimizing procedure for DEFT converges much faster than that for 
OSPF. 


In summary, DEFT provides a new way to compute link weights for OSPF 
that exceeds the current benchmark based on local search methods while re- 
ducing computational complexity at the same time. Furthermore, the perfor- 
mance turns out to be very close to the much more complicated and difficult 
to implement family of MPLS-type routing protocols, which allows arbitrary 
flow splitting. 

More recently in [71], we have proved that a variation of DEFT, called 
PEFT, can provably achieve the optimal traffic engineering as a link-state 
routing protocol with hop-by-hop forwarding, with the optimal link weights 
computed in polynomial time and much faster than local search methods for 
link weight computation for OSPF. This answers the question on optimal 
traffic engineering by link state routing conclusively and positively. 


5.5.2 DEFT: Framework and Properties 


Given a directed graph G = (V,E) with capacity cy, for each link (u,v), 
let D(s,t) denote the traffic demand originated from node s and destined 
for node t. ®( fu», Cu.v) is a strictly increasing convex function of flow fy, 
on link (u,v), typically a piecewise linear cost [24, 59] as shown in equation 
(5.45). The networkwide objective is to minimize )\(,,,)en P(fuvs Cu,v)- 


tua Fol oie < 1/3 
3fu,v — 2/3 Cu,» 1/3 < Focal Cie < 2/3 
_ 10 fu,o _ 16/3 Cu,v 2/3 < facul Cin < 9/10 
DY Fis Cun) ~~ 70 fu _ 178/3 Cie 9/10 < Fiat Once < 1 (5.45) 


500 fu,» — 1468/3 cu» 1< fuy/Cu < 11/10 
5000 fu,» — 16318/3cy. 11/10 < fuy/Cuv- 
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In link-weight-based traffic engineering, each router wu needs to make an 
independent decision on how to split the traffic destined for node t among 
its outgoing links only using link weights. Therefore, it calls for a function 
(I'(-) > 0) to represent the traffic allocation. 

Shortest path routing (e.g., OSPF) evenly splits flow across all the out- 
going links as long as they are on the shortest paths. First of all, we need a 
variable to indicate whether link (u,v) is on the shortest path to t. Denote 
Wu,v as the weight for link (u,v), and df, as the shortest distance from node 
u to node ¢; then d‘, + Wy,» is the distance from u to t when routed through 
v. The gap of the two above distances, hi, = di, + Wu,» — di, is always larger 
than or equal to 0. Then (u,v) is on the shortest path to t if and only if 
ai = 0. Accordingly, we can use a unit step function of Phas to represent 
the traffic allocation for OSPF as follows. 

t 1, if Af, =0 
Pay) = oO i Big Oh ee) 


The flow proportion on the outgoing link (u,v) destined for ¢ at u is 


P(nt,)/ S> Pht ;). 


(u,j)EE 


Denote f/,,, as the flow on link (u,v) destined for node t and fj, as the flow 
sent along the shortest path of node u destined for t; then 
a dol Maa): (5.47) 
The I'(hj,,,) function (5.46) (i.e., evenly splitting) results in intractabil- 
ity in searching for the best link weights under OSPF. In part inspired by 
Fong et al.’s work in [21], we can define a new I(hi,,,) function to allow 
for flow on nonshortest paths. Intuitively, we may want to send more traffic 
on the shortest path than on a nonshortest path. Moreover, the traffic on 
a nonshortest path should be 0 if the distance gap between the nonshortest 
path and the shortest path is infinitely large. Based on the above intuition, 
I'(hi,,,) should be a strictly decreasing continuous function of h{,,, bounded 
within [0,1]. The exponential function is one of the natural choices, and the 
performance of using such function turns out to be excellent. 
In [70], we propose an IGP with distributed exponentially weighted flow 
splitting: ; 
t _ eu, if di‘, = di, 
tye) = otherwise; (5.48) 
that is, the routers can direct traffic on nonshortest paths, with an exponen- 
tial penalty on longer paths. 
The following properties of DEFT can be proved [70]. 
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Proposition 5.6. DEFT can realize any acyclic flow for a single-destina- 
tion demand within polynomial time. It can also achieve optimal routing with 
a single destination within polynomial time. For any traffic matrix, it can 
determine a unique flow for a given link weighting within polynomial time. 


Proposition 5.7. DEFT is always better than OSPF in terms of minimizing 
total link cost or the maximum link utilization. 


5.5.38 DEFT: Optimization Formulation and Solutions 


Note that it is still difficult to directly integrate the exponentially weighted 
flow splitting of DEFT into an optimization formulation because of its dis- 
crete feature; that is, the traffic destined for node ¢ can be sent through link 
(u,v) if and only if d!, > d!,. Instead of introducing some binary variables, we 
relax (5.48) into (5.49) first, and then, by properly setting the lower bound 
of all link weights, a constant parameter Wmin, make such relaxation as tight 
as we want: 


1) oe (5.49) 


Indeed, consider a flow solution ape (5.49); there is a link (uw, v) where 
dt > dt, and ft, > 0, then fi, < fie Mur = fiem (dtu du) < ft e~wmin, 
If Wmin is large enough, this flow portion, which is infeasible to DEFT on link 
(u,v), could be neglected. 

Therefore, we present the following optimization problem, called ORIG, 
using the relaxed rule of flow splitting as the approximation for the traffic 
engineering under DEFT. 


minimize YS P(fuvs Cu,v) 5.50) 
(u,v)EE 

subject to y tae 2. eg yt 5.51) 

zily,z)EE x:(a,y)EE 

fu = ew eA 3U? 5.52) 

ae vo = E+ Wun — di, 5.53) 

ga een, 5.54) 

Abe — Max(y, v)EE fr wo 5.55) 

variables iy 2 anes es aah aces fue Ue 5.56) 


Note that both the flow splittings and the link weights are incorporated as 
optimization variables in one problem, with further constraints relating them. 
Constraint (5.51) is to ensure flow conservation at an intermediate node y. 
Constraint (5.52) is for flow aggregation on each link. Constraint (5.53) is 
from the definition of gap of shortest distance. Constraints (5.54) and (5.55) 
come from (5.47) and (5.49). In addition, (5.54) and (5.55) also imply that 
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fiw < fi, and that h{,,, of at least one of an outgoing links (u,v) of node u 
destined for node t should be 0; that is, the link (uw, v) is on the shortest path 
from node u to node t. 

Problem ORIG is nonsmooth and nonconvex due to nonsmooth constraint 
(5.55) and nonlinear equality (5.54). In [70], we propose a two-stage iterative 
relaxation to solve problem ORIG. 

First, we relax constraint (5.55) into (5.57) below: 


t < t 
oo a, v)€E ume Vt Vv, Vue V. (5.57) 


Equations (5.50)—(5.54), (5.56), and (5.57) constitute problem APPROX. 

We only need to obtain a “reasonably” accurate solution (link weighting 
W) to problem APPROX because the inaccuracy caused by the relaxation 
(5.57) will be compensated by a successive refinery process later. From the 
W, we can derive the shortest path tree ws t)® for each destination t, and 
all other dependent variables (di, hi, 4, fis fi,o: fuw) Within DEFT. 

We then use these values as the initial point (which is also strictly feasible) 
for anew problem REFINE, which consists of equations (5.50)—(5.54), (5.56), 
and (5.58) below: 


fu = fio Vee VAVuE VN (u,v) € T(W,2). (5.58) 


With the two-stage iterative method, we are left with two optimization 
problems, APPROX and REFINE, both of which have convex objective func- 
tions and twice continuously differentiable constraints. To solve the large- 
scale nonlinear problems APPROX and REFINE (with O(|V||£]) variables 
and constraints), we extend the primal—dual interior point filter line search 
algorithm, IPOPT [68], by solving a set of barrier problems for a decreasing 
sequences of barrier parameters jz converging to 0. 

In summary, in solving problem APPROX, we mainly want to determine 
the shortest path tree for each destination (i.e., deciding which outgoing link 
should be chosen on the shortest path). Then in solving problem REFINE, 
we can tune the link weights (and the corresponding flow) with the same 
shortest path trees as in APPROX. 

The pseudocode of the proposed two-stage iterative method for DEFT 
is shown in Algorithms 6A and 6B. Most instructions are self-explanatory. 
Function DEFT_FLOW(W) is used to derive a flow from a set of link 
weights W. Given the initial and ending values for barrier parameter yp, 
maximum iteration number, with/without initial link weighting/flow, func- 
tion DEFT_IPOPT() returns a new set of link weights as well as a new flow. 
Note that, as shown in Algorithm 6B, when DEFT_IPOPT() is used for prob- 
lem APPROX, it returns with the last iteration rather than the iteration with 
the best Flow; in terms of the objective value as in problem REFINE. This 


6 To keep T(W, t) as a tree, only one downstream node is chosen if a node can reach the 
destination through several downstream nodes with the same distance. 
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is because problem APPROX has different constraints from problem ORIG 
and a too greedy method may leave small search freedom for the succes- 
sive REFINE problem. Finally, we need to specify initial and terminative 
po values, (Hinit > flend_approx > /lendrefine), and maximum iteration number 
Iterapprox > Iterrefine. AS shown in the next section, it is straightforward to 
specify these parameters. 


Algorithm 6A. DEFT Solution. 


1. (u, W) — DEFT IPOPT (pinit, Mend_approx; [tefapprox, nil) 

2. Initial_Point — (W, DEFT_FLOW(W)) 

3. (u, W) — DEFT_IPOPT (yp, Mena refine; [terrefine, Initial_Point) 
4, Return (W, DEFT_FLOW(W)) 


Algorithm 6B. DEFT IPOPT. 


If Initial_Point # nil Then 
Initiate the problem with Initial_Point /“REFINE*/ 
End if 
For each iteration ¢ < Itermax, with Mstart > WU > Mena do 
[4 — current value for 
W; < current values for all wz,» 
Flow; — DEFT_FLOW(W;) 
end for 
If Initial_Point = nil then 
return (j4;,W;) of the last iteration /*APPROX*/ 
else 
return (j4;,W;) of the iteration with the best Flow; in terms of objective 
value /*REFINE*/ 
end if 


5.5.4 Numerical Examples 


We summarize some of the numerical results in [70] on various schemes under 
many practical scenarios. We employ the same cost function (5.45) as in [23]. 
The primary metric used is the optimality gap, in terms of total link cost, 
compared against the value achieved by optimal routing using CPLEX 9.1 [16] 
via AMPL [25]. The secondary metric used is the maximum link utilization. 
We do not reproduce the performance of some obvious link-weight-based traf- 
fic engineering approaches for OSPF, for example, UnitOSPF (setting all link 
weights to 1), RandomOSPF (choosing the weights randomly), InvCapOSPF 
(setting the weight of an link inversely proportional to its capacity as rec- 
ommended by Cisco), or L2OSPF (setting the weight proportional to its 
physical Euclidean distance) [23], because none of them performs as well as 
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the state-of-the-art local search method proposed in [23]. In addition, because 
DEFT is always better than OSPF in terms of minimizing the maximum link 
utilization or the sum of link cost, we bypass the scenarios where OSPF can 
achieve near-optimal solution. Instead, we are particularly interested in those 
scenarios where OSPF does not perform well. 

For fair comparisons, we use the same topology and traffic matrix as 
those in [23]. The 2-level hierarchical networks were generated using GT- 
ITM, which consists of two kinds of links: local access links with 200-unit 
capacity and long distance links with 1000-unit capacity. In the second type 
of topology, the random topologies, the probability of having a link between 
two nodes is a constant parameter and all link capacities are 1000 units. 

Although AT&T’s proprietary code of local search used in [23] is not pub- 
licly available, there is an open source software project with IGP weight op- 
timization, TOTEM 1.1 [66]. It follows the same lines as [23], and has similar 
quality of the results. It is slightly slower due to the lack of implementation 
of the dynamic Dijkstra algorithm. We use the same parameter setting for 
local search as in [24, 23] where link weight is restricted as an integer from 1 
to 20, initial link weights are chosen randomly, and the best result is collected 
after 5000 iterations. 

To implement the proposed two-stage iterative method for DEFT, we mod- 
ify another open source software, IPOPT 3.1 [34], and adjust its AMPL in- 
terface to integrate it into our test environment. We choose [init = 0.1 for 
most cases except for Uinit = 10 for the 100-node network with heavy traffic 
load. We also choose flend_approx = 10~*, fendrefne = 10~°, and maximum 
iteration number Iterapprox = 1000, Iteryefine = 400. The code terminates 
earlier if the optimality gap has been less than 0.1%. 


Example 5.9. DEFT and OSPF on 2-level topology. The results for a 2-level 
topology with 50 nodes and 212 links with seven different traffic matrices are 
shown in Table 5.5. The results are also depicted graphically in Figure 5.15. 


Table 5.5 Results of 2-level topology with 50 nodes and 212 links 


[Total Traffic Demand] 1700 [2000] 2200 [2500] 2800 | S100] 3400 ] 
[Ave Link Load-OPT [0.128)0.148] 0.17 |0.192[0.216 [0.242] 0.267 | 


[Max Link Coad-OPT]0.667|0.667)0.667| 0.9 | 09 | 09 | 09 _| 
[Opt Gap-OSPF___|2.8%|4.4%| 7.29%] 9.4% [20.77 64.276|202. 8%] 
[Opt GapDEFT [0.1% ]o1%[o.1%]o.1%] 0.1% [0.1% [ 01% | 


In addition to the two metrics, optimality gap in terms of total link cost 
and maximum link utilization,’ we also show the average link utilization 
under optimal routing as an indication of network load. From the results, we 


7 Note, however, maximum link utilization is not a metric as comprehensive as total link 
cost because it cannot indicate whether there are multiple overcongested links. 
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Fig. 5.15 Comparison of DEFT and local search OSPF in terms of optimality gap and 
maximum link utilization for a 2-level topology with 50 nodes and 212 links (Example 5.9). 
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Fig. 5.16 2-level topology with 50 nodes and 148 links (Example 5.9). 


can observe that the gap between OSPF and optimal routing can be very 
significant (up to 222.8%) for a practical network scenario, even when the 
average link utilization is low (<27%). In contrast, DEFT can achieve almost 
the same performance as the optimal routing in terms of both total link cost 
and maximum link utilization. 


Example 5.10. DEFT and OSPF on random topology. Similar observations 
can be found for other scenarios, for example, as shown in Figure 5.17. With- 
out exception, the curves of the DEFT scheme (the horizontal lines almost 
coinciding with z-axes) almost completely overlap those of optimal routing, 
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Fig. 5.17 Random topology with 50 nodes and 245 links (Example 5.10). 


in terms of total link cost and maximum link utilization. Among these numer- 
ical experiments, the maximum optimality gap of OSPF is as high as 252% 
and that of DEFT is only at worst 1.5%. In addition, DEFT reduces the 
maximum link utilization compared to OSPF on all tests, and substantially 
on some tests. 


Simulations on rate of convergence, as well as comparisons of computation 
and implementation complexity, can be found in [70]. 


5.5.5 Concluding Remarks and Future Directions 


Network operators today try to alleviate congestion in their own network by 
tuning the parameters in IGP. Unfortunately, traffic engineering under OSPF 
or IS-IS to avoid networkwide congestion is computationally intractable, forc- 
ing the use of local-search techniques. While staying within the context of 
link-weight-based traffic engineering, we propose a new protocol called [70] 
distributed exponentially weighted flow splitting. DEFT significantly out- 
performs the state-of-the-art OSPF local search mechanisms in minimizing 
networkwide congestion. The success of DEFT can be attributed to two ad- 
ditional features. First, DEFT can put traffic on nonshortest paths, with an 
exponential penalty on longer paths. Second, DEFT solves the resulting op- 
timization problem by integrating link weights and the corresponding traffic 
distribution together in the formulation. The novel formulation leads to a 
much more efficient way of tuning link-weight than the existing local search 
heuristic for OSPF. 
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DEFT is readily implementable as an extension to existing IGPs. It is 
provably always better than OSPF in minimizing the sum of link cost. DEFT 
retains the simplicity of having routers compute paths based on configurable 
link weights, while approaching the performance of the much more complex 
routing protocols that can split traffic arbitrarily over any paths. In summary, 
in terms of minimizing total link cost, performance of OSPF by local search 
heuristics is at best what is attained by solving the ILP, which is substantially 
outperformed by DEFT that comes very close to the optimal routing. In terms 
of a performance-complexity tradeoff, DEFT clearly exceeds OSPF. 

In this section, we only address the link weighting under DEFT for a given 
trafic matrix. The next challenge would be to explore robust optimization 
under DEFT, optimizing to select a single weight setting that works for a 
range of traffic matrices and/or a range of link/node failure scenarios. Ex- 
tension of the ideas behind DEFT to routing across different autonomous 
systems managed by different network operators is another interesting future 
direction. 

In the larger picture of “design for optimizability”, DEFT shows one case 
where by changing the underlying protocol, the resulting new optimization 
formulation becomes much more readily solvable or approximable. We expect 
this new approach to tackle nonconvex problems to bring many new results 
and insights to the engineering of communication networks. Indeed, in an 
extension of DEFT work [71], we have developed the first provably optimal 
link state routing protocol with hop by hop forwarding, called PEFT, which 
achieves optimal traffic engineering with polynomial time (and very fast in 
practice) computation of optimal link weights. 
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Chapter 6 

Multilevel (Hierarchical) Optimization: 
Complexity Issues, Optimality 
Conditions, Algorithms 


Altannar Chinchuluun, Panos M. Pardalos, and Hong-Xuan Huang 


Summary. In this chapter we discuss some algorithmic and theoretical re- 
sults on multilevel programming including complexity issues, optimality con- 
ditions, and algorithmic methods for solving multilevel programming prob- 
lems. We also discuss an approach, which is called the multivariate partition 
approach, for solving a single-level mathematical programming problem based 
on its equivalent multilevel programming formulation. 


Key words: Hierarchy, multilevel programming, multivariate partition ap- 
proach 


6.1 Introduction 


The word hierarchy comes from the Greek word “tepapxia,” a system of 
graded (religious) authority. Hierarchical (multilevel) structures are found 
in many complex systems and in particular in biology. Biological systems 
are characterized by hierarchical architectural designs in which organization 
is controlled on length scales ranging from the molecular to macroscopic. 
These hierarchical architectures rely on critical interfaces that link structural 
elements of disparate scale. 

Nature makes very different systems (that have specific hierarchical com- 
posite structures) out of very similar molecular constituents. First, the struc- 
tures are organized in discrete levels. Second, the levels of structural organiza- 


Altannar Chinchuluun - Panos M. Pardalos 
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 
32611, U.S.A., e-mail: altannar@ufl.edu, pardalos@ufl.edu 


Hong-Xuan Huang 
Department of Industrial Engineering, Tsinghua University 
e-mail: huanghongxuan@tsinghua.org.cn 


D.Y. Gao, H.D. Sherali, (eds.), Advances in Applied Mathematics and Global Optimization 197 
Advances in Mechanics and Mathematics 17, DOI 10.1007/978-0-387-75714-8 6, 
© Springer Science+Business Media, LLC 2009 


198 A. Chinchuluun, P.M. Pardalos, H.-X. Huang 


tion are held together by specific interactions between components. Finally, 
these interacting levels are organized into an oriented distinct hierarchical 
composite system of specific function. 

The mathematical study of hierarchical structures can be found in di- 
verse scientific disciplines including environment, ecology, biology, chemical 
engineering, classification theory, databases, network design, game theory, 
and economics. The study of hierarchy occurring in biological structures re- 
veals interesting properties as well as limitations due to different properties 
of molecules. Understanding the complexity of hierarchical designs requires 
systems methodologies that are amenable to modeling, analyzing, and opti- 
mizing these structures. 

Hierarchical optimization (or multilevel) can be used to study properties 
of these hierarchical designs. In hierarchical optimization, the constraint do- 
main is implicitly determined by a series of optimization problems that must 
be solved in a predetermined sequence. Hierarchical optimization is a gener- 
alization of mathematical programming. The simplest two-level (or bilevel) 
programming problem describes a hierarchical system composed of two levels 
of decision makers. 

A bilevel programming problem consists of two optimization problems 
where the constraint set of the upper-level problem is implicitly determined 
by the lower-level problem. 


min F(x, y) (6.1) 


where y solves 


min f(z, z) 
s.t. g(x, z) <0. 
re X CR", y,z€YCR”, 


where X C R” and Y C R™ are compact sets, and G : R” x R™ — R?, 
g : R” x R” = RY, and FY f : R” x R” — R are scalar functions. Let 
Q = {(xz,y) | G(a,y) < 0, g(x,y) < 0, (zy) € R” x R™} be the constraint 
set of the problem. We also denote the set of solutions of the lower-level 
programming problem by S(a) for any x € X. 

When the objective functions and constraint functions are linear, Problem 
(6.1) is called a bilevel linear programming problem. For a bilevel linear 
programming problem, it is known that the solution to the problem occurs 
at an extreme point of the feasible set [16, 9]. If at least one of the objective 
functions or constraint functions is nonlinear, then Problem (6.1) is called 
a nonlinear bilevel programming problem. In general, bilevel optimization 
problems are nonconvex, and therefore, it is not easy to find globally optimal 
solutions. 
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Bilevel programming has a wide range of applications in many areas includ- 
ing economics [8], network design [82, 15], transport planning and modeling 
(69, 71], credit allocation [74], and electric power pricing [51]. The reader is 
referred to Du and Pardalos [38] and Migdalas et al. [72] for a more compre- 
hensive survey of applications. An excellent bibliographical survey of bilevel 
programming can also be found in Vicente and Calamai [88]. 

It seems that hierarchical structures are harder to be managed than com- 
pletely centralized systems. Then what are the rationalities for hierarchical 
structures to exist? Answers to such questions may help us to understand the 
reason behind hierarchical structures in biology. 

Multilevel programming problems have been studied extensively in their 
general setting for the last decade. Many algorithmic developments are based 
on the properties of special cases of the bilevel problem (and the more gen- 
eral problem) and reformulations to equivalent or approximating models, 
presumably more tractable. Most of the exact methods are based on branch- 
and-bound or cutting plane techniques and can handle only moderately sized 
problems. 

The present chapter is organized as follows. In Section 6.2, we consider 
complexity issues of bilevel programming, and Section 6.3 covers optimal- 
ity conditions of the problem. We discuss some of the algorithmic methods 
for solving the bilevel programming problem in Section 6.4. We present a 
multivariate partitional approach for solving single-level mathematical pro- 
gramming problem based on its equivalent bilevel programming formulation 
in Section 6.5. We also discuss two real-world applications of bilevel program- 
ming in transportation systems in Section 6.6, and some open problems in 
multilevel programming in Section 6.7. 


6.2 Complexity Issues 


Complexity analyzes the intrinsic difficulty of optimization problems and 
reveals surprising connections among many other optimization problems and 
their solutions. 


6.2.1 Complexity of Finding a Global Solution 


Calamai and Vicente [22] proposed a technique to generate bilevel program- 
ming problems. Using their technique, a linear bilevel programming problem, 
which has an exponential number of local minima, can be generated. Jeroslow 
[56] has shown that the linear bilevel program is ’P-hard. His results were 
subsequently confirmed and simplified by Ben-Ayed and Blair [14] and Blair 
[19], and strengthened by Hansen et al. [46]. Hansen et al. demonstrated 
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that the linear bilevel problem without the upper-level constraints, which 
is a more restricted max-min problem, is strongly \’P-hard by considering 
the complexity of a special case of a linear max-min optimization problem. 
They reduced KERNEL to this problem whereas KERNEL is known to be 
NP-hard [43]. Max-min linear programs can be formulated as convex maxi- 
mization (or concave minimization) problems. The computational complexity 
of min-max optimization problems has been extensively studied in Ko and 
Lin [61] These problems are naturally characterized by II?’, the second level 
of polynomial-time hierarchy. 

For higher-level hierarchies, Jeroslow has shown that the optimal value of 
a (k + 1)-level linear program is ¥,-hard. 


6.2.2 Complexity of Local Search Methods 


Computing locally optimal solutions is presumably easier than finding glob- 
ally optimal solutions in practice. However, from the complexity point of view 
it has been shown that the problem of checking local optimality for a feasible 
point and the problem of checking whether a local minimum is strict, are 
NP-hard even for instances of quadratic problems with a simple structure in 
the constraints and the objective. These results have been proved using the 
classical 3-satisfiability problem. 

Vicente et al. [90] proved that checking local optimality in bilevel program- 
ming is an NP-hard problem. The proof uses a similar idea as in Pardalos 
and Schnitger [77]. 


6.2.3 Approximation Algorithms 


Jeroslow [56] observed that for any constant factor c, it remains V’P-hard to 
find a solution within a multiplicative factor of c of the optimum. 

Deng et al. [37] have extended this result to disallow even an additive 
constant for a sufficiently small multiplicative factor. That is, for sufficiently 
small c; > 0 and for some cz > 0, there is no algorithm that can guarantee a 
solution within (1 + c1)- optimum + cg unless NP = P. 


6.2.4 Polynomially Solvable Problems 


Liu and Spencer [65] have introduced a polynomial-time algorithm for a 
bilevel linear problem when there are a constant number of lower-level control 
variables. 
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Deng et al. [37] have presented a much simpler proof for the above result 
which also allows for an extension to a k-level linear programming problem 
when the total number of variables controlled by lower-level linear programs 
is a constant. 


6.3 Optimality Conditions 


Bard [9] first derived optimality conditions for the bilevel programming prob- 
lem based on an equivalent single-level programming problem of the bilevel 
programming problem. However, Clark and Westerberg [30] showed a coun- 
terexample of these conditions. 

Ye et al. [95] presented necessary optimality conditions for the generalized 
bilevel programming problem, which is a bilevel programming problem with 
a variational inequality as the lower-level problem. They proved that the gen- 
eralized bilevel programming problem is equivalent to single-level problems 
under some assumptions on the objective function. Then, Karush—Kuhn— 
Tucker optimality conditions are applied to the single-level problems to derive 
optimality conditions for the generalized bilevel programming problem. 

Let us define the concept of global and local optimality of the bilevel 
problem: 


min F(a, y) (6.2) 
s.t. G(x) <0, 
y solves 
min f(x, z) 


s.t. g(x, z) < 0. 
zéER”, y,z ER”, 


where X = {x | G(x) < 0} is a compact set. 


Definition 6.1. A point (a*, y*) is called a local optimal solution of Problem 
(6.2), if and only if, * € X, y* € S(a*) with 


F(a*,y") < F(z",y), for ally € S(x*), 
and there exists 6 > 0 such that 
o(a*) < d(x) for alle € XN B(a*, 6d), 


where ¢(a) = min{ F(x, y) | y € S(a)}. 


The point («*,y*) is called a global optimal solution of Problem (6.2) if 
we can select as 6 = oo. 
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Here, B(x*,6) is the open ball of radius 6 centered at point «* € X; that is, 
B(a*,6) = {a € R” | ||a—a*|| < 6}. As we mentioned before, the bilevel prob- 
lem can be formulated as a single-level problem by replacing the lower-level 
problem by its Karush—Kuhn—Tucker conditions under some constraint qual- 
ification if the lower level problem is convex. Then, the resulting problem is a 
smooth optimization problem. However, Scheel and Scholtes [80] showed that 
the equivalent single-level formulation of the bilevel problem violates most of 
the usual constraint qualifications. Therefore, they introduced a nonsmooth 
version of Karush—Kuhn—Tucker conditions: 


min F(x, y) (6.3) 
x,y,r 
s.t. G(x) <0, 
V,L(2, y, A) = 0, 
min{—g;(a, y), Aj} = 0, j = 1, se, 
where L(x,y,\) = f(x,y) + A g(a, y) is the Lagrangian of the lower-level 
problem. 

Then, every local solution of Problem (6.2) is a local solution of Problem 
(6.3). It is not difficult to see that every local solution (a*, y*, \*) of Problem 
(6.3) is also a local solution of the problem 

min F(a, y) (6.4) 
x,y Ar 
s.t. G(x) <0, 
VyL(a, Y, r) — 0, 
9;(@,y) = 0 for gj(a*,y") = 0, 
Aj = 0 for AF = 0, 
9;(@,y) < 0 for gj(a*,y") < 0, 
rj = 0 for Aj > 0. 
Let us consider the Mangasarian—Fromowitz constraint qualification [68]: 


Definition 6.2. We say that Mangasarian—Fromowitz constraint qualifica- 
tion (MFCQ) is satisfied at point (x,y) if there exists a vector d € R”™ 
satisfying 


d'Vyg9;(2,y) <0 for all j € I(z,y) = {i | gi(z,y) =0, t= 1,...,q}- 


Now, we are ready to present a theorem [35], which is an application of the 
results in [80], for Problem (6.3). 


Theorem 6.1. Let (a*,y*, *) be a local minimum of Problem (6.3). Suppose 
that (MFCQ) is satisfied for Problem (6.4) at (a*,y*,A*). Then, there exist 
multipliers (u,v,w,r) such that 
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VE(x") +? (VeG(e"),0) + V(VyL(x*,A*)v) + w?V9(x*) = 0, 
Vyg9(x" )u —r =0, 

ey a0 SH 13346 

Ajrj = 0, F=1,-- 

u? G(a*) = 0, 

u = 0, 


where x* = (a*,y*). 


Vicente and Calamai [89] presented necessary and sufficient optimality 
conditions for bilevel programming problems with quadratic strictly convex 
lower-level problems using on the local geometry of the problems. Based on 
the geometrical property of the problems, they observed that the set of fea- 
sible directions at a point is a finite union of some convex sets, and extended 
the first- and second-order optimality conditions of single-level programming 
to bilevel programming. 


6.4 Algorithms 


Algorithmic approaches for the bilevel problem may be grouped as fol- 
lows: (1) extreme point ranking methods, (2) branch-and-bound algorithms, 
(3) complementarity pivot algorithms, (4) descent methods, (5) penalty func- 
tion methods, and (6) trust region methods. In this section, we discuss some 
of these algorithmic approaches for solving the bilevel programming problem. 
In particularly, extreme point algorithms, branch-and-bound algorithms, and 
a multicriteria approach are covered. Complementarity pivoting is a method 
based on replacing the lower-level problem with its Karush—Kuhn—Tucker 
conditions. Many authors suggested different pivot algorithms for the modi- 
fied problem including [18, 59, 73]. Descent methods [79, 90, 62, 40] look for a 
local solution of the bilevel problem based on feasible descent directions with 
respect to the upper-level function. Penalty function methods [4, 5, 60, 20, 93] 
are also local search methods for solving the bilevel problem. 


6.4.1 Extreme Point Algorithms 


As we mentioned before, the linear bilevel programming problem has a nice 
property that the solution of the problem occurs at an extreme point of the 
feasible set. Therefore, the problem can be solved using a vertex enumeration 
technique. Candler and Townsley [24] proposed the first enumeration algo- 
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rithm for the linear bilevel programming problem, which has no upper-level 
constraints and the lower-level problem with unique solution. Their algorithm 
enumerates the basis of the lower-level problem. 

One of the well-known vertex enumeration algorithms is the kth best 
method introduced by Bialas and Karwan [17]. The &th best method enumer- 
ates the basis of the resulting problem of the bilevel problem after relaxing 
the objective function of the lower-level problem. 

Tuy et al. [86] proposed a global optimization approach for the linear two- 
level problem: 


min cla +dry (6.5) 
«>0 
s.t. Aya + Buy <u, 
where y solves 

min efx + dz 

z>0 

s.t. Aja + Byz <7 <0. 

zeER”, y,z ER”. 


We can choose c; = 0 because the value of this parameter does not affect the 
solution of the linear bilevel problem. Therefore, Problem (6.5) is equivalent 
to the following reverse convex programming problem. 


min cha +dty 

s.t. Ax+ By <r 
diy < S(a) 
r,y 2 0, 


where A = (A,, Ai)", B = (By, Bi)", r = (ru, 71)", and S(z) is the optimal 
objective function value of the lower-level problem 


min dj z 
st. Ajax + Byz <1 
y = 0. 


The function S(a) is a convex polyhedral function and the problem is a lin- 
ear program with an additional reverse convex constraint. Several algorithms 
[49, 83] are available for these types of problems. They applied the approach 
proposed by Tuy [84] to reduce the dimension of the above optimization prob- 
lem. Then, a vertex enumeration technique is applied to solve the resulting 
problem. 

Zhang and Liu [96] proposed an extreme point algorithm to solve a mathe- 
matical programming problem with variational inequalities. As we mentioned 
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before, this problem generalizes the bilevel programming problem in which 
the lower-level problem is convex. Some other vertex enumeration methods 
can be found in [32, 26]. 


6.4.2 Branch-and-Bound Algorithms 


One way to tackle the bilevel problem is to replace the lower-level problem by 
its corresponding Karush—Kuhn—Tucker conditions and obtain a single-level 
mathematical programming problem. However, if the lower-level problem is 
nonconvex, the corresponding single-level problem is no longer equivalent to 
the bilevel problem. Most of the branch-and-bound algorithms for the bilevel 
programming problem are based on the single-level programming problem 


min F(z, y) 
s.t. G(x, y) <0, g(x,y) < 0, 


q 

Vy f(z, y) + So AiVy9(2,Y) = 0, 
j=l 

Aj95(z,y) = 9, 

A>O0, cEX, YEY. 


Due to the stationary and complementarity conditions, the above problem is 
nonconvex even if the bilevel problem is linear. 

Bard and Moore [12] implemented a branch-and-bound algorithm, which 
was initially suggested by Fortuny-Amat and McCarl [41], for solving the 
bilevel problem where the constraints and the objective function of the upper- 
level problem are linear and, the objective function of the lower-level problem 
is quadratic. 

Hansen et al. [46] derived necessary optimality conditions for linear bilevel 
programming, expressed in terms of active constraints of the lower-level 
problem. Based on the optimality conditions, they established a branch-and- 
bound algorithm for linear bilevel programming. Their computational results 
showed that this approach was favorable for the linear case compared to the 
methods by Bard and Moore [12] and Judice and Faustino [58]. Al-Khayyal 
et al. [6] and Bard [10] also introduced branch-and-bound algorithms for the 
quadratic bilevel problem. 

Giimiis and Floudas [45] proposed a branch-and-bound algorithm for the 
bilevel programming problem. Their approach is based on a relaxation, made 
by Karush—Kuhn—Tucker conditions of the feasible region of the bilevel prob- 
lem. The relaxed problem is solved using the global optimization technique 
described in [2, 3]. When the objective and constraint functions are twice 
differentiable and the linear constraint qualification holds for the lower-level 
program constraints, this approach guarantees global optimality. Bard and 
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Moore [13] and Wen and Yang [92] have also proposed branch-and-bound al- 
gorithms for integer and mixed integer linear bilevel programming problems. 


6.4.38 A Multicriteria Approach 


Several authors have been interested in the relationship between bicriteria 
programming and the bilevel programming problem including Bard [9] and 
Unlii [87] for the linear case. They claimed that an optimal solution of the 
linear bilevel problem is a nondominated point for the objective functions of 
the upper-level and the lower-level programming problems. However, Candler 
[23], Clarke and Westerberg [30], and Haurie et al. [48] reported counterexam- 
ples of this argument. Fiilop [42] pointed out that more than two criteria are 
needed to establish the relationship between multicriteria programming and 
bilevel programming. Fiilép showed that, for each linear bilevel programming 
problem, there exists some linear multicriteria problem such that the global 
solution of the first problem is an optimal solution for minimizing the upper- 
level objective function over the Pareto optimal set of the second problem. 
In order to discuss a multicriteria approach, we need to define the concept of 
domination. 


Definition 6.3. A point ain aset M is said to be nondominated with respect 
to an order “=”, if there does not exist any point b € M such that b = a and 


ab. 


Recently, Fliege and Vicente [39] established the link between these two 
programs. Let us consider the bilevel programming problem (6.1) with X = 
R” and Y = R™. 

They defined an order such that every nondominated point of R” x R™ is 
a solution to the bilevel problem. Let (21, 41), (v2, y2) € R” x R™. Then the 
order “~<” can be defined as follows. 


Definition 6.4. 2! = (x1, y,) < x? = (x2, ye) is equivalent to the following 
conditions. 


— £,= 2, and f(%1,y1) < f(x2, ye) 
or 


— (Vy f(r1,41)\l2 = 0 and F(x1,y1) < F(x2, ya). 


Then, they proved that a nondominated point of R” x R™, with respect 
to the order defined above, is a solution of the bilevel problem. 
Let us define the following function, 


p:(#,y) > (#, F(2,y), F(t, 9), IVy F(a, y)ll2)- 


A cone K can be defined in the image space of vy, R” x R x R x R, as 
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K = {(2, fi, fe, d) | (x = 0 and fo > 0) or (fi > 0 and d>0)}. 


” 


Then, we can define the usual cone order “<x” as 
a<zKbs b-aek. 


Therefore the relation between these two orders is given by the following 
theorem [39]. 


Theorem 6.2. If p(x, y) is a nondominated point with respect to “<x”, then 
(x,y) is nondominated with respect to “<”. 


This theorem says that if we find a nondominated point with repect to “<x”, 
then the point should be an optimal solution of the bilevel program. Fliege 
and Vicente proposed a multicriteria approach to compute nondominated 
points with respect to the order. 


6.5 Multivariate Partition Approach 


In this section we discuss an approach introduced by Huang and Pardalos 
[52] for solving a nonlinear single-level mathematical programming problem 
based on its equivalent multilevel programming formulation. Let us consider 
the following multivariate optimization problem 


min f(z) (6.6) 
s.t. cE D, 


where D C R” (n > 1) is a robust set, and f(a) is a continuous function 
in D. 

Huang and Pardalos [52] proposed a general approach, which was called 
the multivariate partition approach, for finding a stationary point or a local 
minimum of Problem (6.6). This method is based on a multilevel optimization 
formulation of Problem (6.6) and uses the following idea. We partition all the 
independent variables of Problem (6.6) into a number of groups and find an 
improved solution of Problem (6.6) based on solutions of some small-sized 
problems with respect to each group of variables. 

In order to present an equivalent multilevel optimization problem of Prob- 
lem (6.6), we need to define the concept of a partition of a set. 


Definition 6.5. For a given set S, a set Aj,..., Ap) is called a partition of 
the set S if the following conditions are satisfied. 


(i) A; C S is nonempty for i =1,...,p. 
(ii) A;N A; = fori Aj andi,j =1,...,p. 
(in) LE 4g Ag Ss 
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Let S = {a1,...,%n}, and let A; = S\ A; fori=1,...,p 
Then a multilevel optimization formulation of Problem (6.6) can be written 


as 
min i ? mun (A eee | 6.7 
You ED, oe eo Be MAr Ap) }}> (6.7) 
where o = (01,...,0p) is a permutation of the set {1,...,p} and D,, is 


the feasible domain of the variable y,, corresponding to the set A; for each 
eal Derma 2: 

A special case of Problem (6.7) is the bilevel optimization problem. Let 
{I-,I*} be a partition of the index set J = {1,...,n} of Problem (6.6). Let 
us denote the corresponding subsets of S with respect to J~ and It by S~ 
and S*, respectively. We also denote the feasible domains of the variables 
y~ corresponding to the set S~ and yt corresponding to the set St by Dg- 
and Dg+, respectively. Then the bilevel optimization formulation of Problem 
(6.6) can be written in the form 


min g(S_), (6.8) 
y- €Dg- 


where g(S~) is the optimal value of the problem 


min f(y ,S*). (6.9) 


yteDg+ 
The multivariate partition approach consists of the following steps. 


Strategy I. For each 1 = 1,...,p, we obtain an improved point Y,, corre- 
sponding to S* = Ag, by selves Problem (6.9) with S~ = A; = S\Ag,. 
or 

Strategy II. For each i = 1,...,p, we denote A; = {j,, | j <i, j = 
1,...,p}, Ay = (Yo, |G >i, 7 =1,...,p}, and S~ = A; UA}. We obtain 
an improved point J, corresponding to St = A,, by solving Problem 
(6.9) with S~. 

Improved Point. Based on the improved exploratory point 

Y = (Yo.s-++sYo,), we obtain an improved feasible point y3 using a search 
and decision scheme. 


The convergence of the multivariate partitional approach was proved in 
[52] under some assumptions. We note that it is sufficient to find local solu- 
tions of Problems (6.9) in order to find a local solution of Problem (6.6) in 
the algorithm. Therefore, even if we find global solutions of Problems (6.9) in 
Strategies I and II, there is no guarantee that we can find a global solution of 
Problem (6.6). Huang and Pardalos showed that the multivariate partition 
approach could be regarded as an extension of the Jacobi and Gauss-Seidel 
iterative algorithms. Next, we show some other examples of the multivariate 
partition approach. 
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6.5.1 Examples 


K-Means Type Algorithms 


Clustering is the unsupervised classification of the patterns. The K-means 
[66] method is one of the most popular clustering methods that have been ap- 
plied in a variety of fields including pattern recognition, information retrieval, 
document extraction, microbiology analysis, and so on. We now show that the 
K-means algorithm can be seen as a special case of the multivariate partition 
approach. The goal of this method is to classify a given data set through a 
certain number of clusters, such that some metric relative to the centroids 
of the clusters is minimized. We can define our problem mathematically as 
follows. 

Suppose that we are given a set X of a finite number of points in d- 
dimensional Euclidean space R¢, that is, 


X =(a',27,...,2") where x’ ¢ R*, i=1,2,...,n. 


We aim at finding a partition C; 4 2, 7 = 1,2,...,k: 
k 
X= (JG, GG 
j=l 


=© forallj £1, 


of X, which minimizes the squared error function 


=>" > llé=</, 
1 2tEC; 


j= 


where || - || denotes the Euclidian norm, c’ is the center of the cluster Cj, 
; 1 Ee 2 
——— ya j=1,2,...,k, (6.10) 
ICj| ou; 


and |C;| is the cardinality of C;, 7 =1,2,...,k. 

We note that, in optimal solutions, a cluster can be represented by its 
centroid because each point should be assigned to the cluster that has the 
closest centroid. Thus, we can think of variables of our optimization problem 
as centroids of the clusters. 

The algorithm is composed of the following steps. 
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Algorithm 1. (K-means) An algorithm for solving the clustering problem. 


Step 1. Initialize the centroids e. j=1,2,...,k. Set q=0. 
Step 2. Assign each point x? (i = 1,2,...,m) to the cluster that has the 
closest centroid c) (j € {1,2,...,k}), that is, 


j = argminy <1<ql|2" — e4|)?. 


Step 3. When all points have been assigned, for 7 = 1,2,...,k, calculate 


the new position te of the centroid 7 using equation (6.10). 
Step 4. If ch = c},, for all j = 1,2,...,k, then stop, otherwise set 
q=q-+1and go to Step 2. 


From the algorithm, we can see that Step 3 of the algorithm uses Strategy I 
of the multivariate partitional approach. Equation (6.10) finds a local solution 
of problem 


min g(c',...,c*) 
s.t. 37 ER? 


whereas all other centroids (except centroid 7) are fixed. One way to find 
a solution, which is close to a global minimum, is to locate centroid 7 at 
x',...,2" and choose the one with the least value of g among them after 
reassigning the points to the centroids. This idea is closely related to the 


j-means clustering algorithm [47]. 


Coordinate Descent Method 


If n = p or A; = {a;} for i = 1,...,n in the partition of the multivariate 
partition approach, then many existing optimization algorithms designed for 
solving Problem (6.6) are closely related to this partition including the coor- 
dinate descent method. Basically, each coordinate axis is searched, and a de- 
scent is only made along a unit vector. The cyclic coordinate descent method 
minimizes a function f(#1,...,2,) cyclically with respect to the coordinate 
variables. That is, first x, is searched, then x2, and so on. Various variations 
are possible. One advantage of these methods is their easy implementation. 
Hirsch et al. (2007) proposed a GRASP (greedy randomized search pro- 
cedure [50]), which is a special case of the coordinate descent method, for 
solving the following nonlinear programming problem with box constraint. 


min f(x) 


st. 1 <a <u, 


where f(x) : R” — R, and J, u are lower and upper bounds for the values of 
the variable «. 
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GRASP starts with an initial point « = (1+) /2. Then, for each 1 = 
1,...,n, we solve a problem of one variable x; and other coordinate values of 
the current solution x are fixed. These problems are solved by discretizing the 
solution space using the grid density h. Based on solutions to these problems, 
the method finds an improved feasible point. Then the above procedure is 
repeated. If the number of iterations with no improvements achieves value 
N, the value of h is reduced to its half, and the process is restarted. We now 
present the algorithm. 


Algorithm 2. (GRASP) An algorithm for solving nonlinear programming 
problems 


Input: K, N, f, 1, u. 

Initialization. Set « = (1+ u) /2,h=1, Val = f(x), k =1. 

Step 1. If k > K, terminate the algorithm and z is an optimal (not 
necessary to be global optimal) solution. If 7 > N, set h := h/2 
and j7 = 0. We find a feasible point y € R” using the procedure 
GreedyRandomSol(a, h). 

Step 2. If f(y) < f(x), set 7 := 0, a := y and Val := f(x). Otherwise, 
set 7 =j t+. 

Step 3. Set k:=k+1, and go to Step 1. 


Procedure GreedyRandomSol can be interpreted as follows. 
Algorithm 3. GreedyRandomSol 


Input: f, 1, b, h, a € (0,1), a. 
Step 1. For each i =1,...,n, find a solution x? of the problem 


min f(@1,...,2n) (6.11) 


GG Bp FBG a oo ey) TOT S 1s 

Step 2. Let min and maz be the minimum and the maximum values of 

the set {g1,.--,9n}, respectively. Let S = 2. 

For each i=1,...,n, S:= SU {t} if g; < min + a(maz — min). 

Step 3. Select an index / from the set S randomly, and set y := x and 

Y= a 
Problem (6.11) can be solved by discretizing the solution space. The mesh 
points can be chosen as 1;, 1; + h,l; + 2h,...,l; + mh, where m is the largest 
integer number such that J; + mh < u;. Then we choose the minimum of 
the objective function values at the mesh points. It is not difficult to see 
that GRASP is a special case of the multivariate partition approach. In this 
case, the partition is {Aj,..., An}, where A; = x; for all? = 1,...,n, and 
D; => [l;, wa] for alli = 1, eee NK. 
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6.5.2 Applications of the Multivariate Partition 
Approach 


Lennard-Jones Problem 


The Lennard-Jones problem is one of the most challenging global optimiza- 
tion problems in molecular biophysics. The problem is to find such a structure 
of a protein, a cluster of N atoms, interacting via the Lennard-Jones (dimen- 
sionless nonquantized pair) potential : u(r) = r~'* — 2r~® , that its energy 
E is (globally) minimal. Let the code Py = {x1,...,%n} be the collection of 
centers of N atoms. Then the potential energy F is defined as follows. 


E(a1,...,.2~)= >> o(llzi—2;l)), 


1<i<j<N 


where u(r) = r—!? — 2r-®, 
Therefore, our optimization problem can be formulated as 


min E(a,...,0v)= > »{(lzs—2,ll), 
1<i<j<N 


s.t. 21,...,2n € R?. 


Huang et al. [53] presented necessary optimality conditions for the Lennard— 
Jones problem and applied the multivariate partitional approach to the 
Lennard-Jones problem. The partition can be described by a center of each 
atom; that is, A; = x; fori =1,...,N. They obtained all the expected global 
minima of the problem with 2 < N < 56 using a quasi-Newton accelerating 
approach for the auxiliary problem (6.9). 


Spherical Code Problem 


The spherical code problem has many applications including physics, molec- 
ular biology, and chemistry. Tammes’ problem is one of the well-known cases 
of the spherical code problem which is referred to distribute points on a unit 
sphere such that they maximize the minimum distance between any pair of 
points. In general, we distribute points on a unit sphere according to a certain 
generalized energy. Let a code Py = {21,...,2n} be a set of N points on a 
unit sphere in R”. Then the s-energy associated with the spherical code Py 


is 
Di Ilzi- zl"? ifs #0 


and the spherical code problem can be formulated mathematically as 
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min f,(71,...,0N) 
s.t. 2, € S* ={z | |[z|] =1, cE R"} fori=1,...,n, 


where | 
_ J ws, Pn ifs >0, 
fs(Pw) = { —w(s,Pv) ifs <0. 


In [54], Huang et al. used a multivariate partition approach for the spher- 
ical code problems. The partition of the variables is similar to that of the 
Lennard-Jones problem. 


6.6 Applications 


Multilevel optimization has many applications in different fields including 
economics, transportation systems, engineering system design, environmental 
engineering, and mechanics. Here, we discuss two of these applications in 
transportation systems. 

The concentration of human activities in urban areas has given rise to 
congestion problems that create negative environmental or economical ef- 
fects. Therefore, many researchers have developed efficient methods for road 
network design in order to improve transportation systems. The network de- 
sign problem (NDP) determines a set of parameters that optimizes the road 
network. The model includes traffic signal control, traffic information provi- 
sion, congestion charge, new transportation modes, and road expansion or 
deletion. The NDP is usually formulated as a bilevel programming problem. 
The upper-level part defines system design and the lower-level part defines 
travellers’ behavior. NDPs are classified into two categories: discrete and con- 
tinuous variations. LeBlanc [63] formulated a bilevel programming problem 
for discrete NDP. Discrete models usually consider link or lane additions. 
Abdulaal and LeBlanc [1] later described continuous models of the NDP. 
Continuous models are concerned with network improvements that can be 
modeled as continuous variables including lane and lateral clearance changes 
and other enhancements. The NDP has also been investigated by Janson and 
Husaini [55], Magnanti and Wong [67], and Xiong and Schneider [94]. For a 
more comprehensive survey about the NDP, the reader is referred to Migdalas 
[71] and Cascetta [25]. 

Another application of bilevel programming is the signal setting problem 
(SSP) which maximizes network performance by optimizing traffic signals. 
The main difficulty of the problem arises from the existing interaction be- 
tween network performance and users’ choices of the routes. Gartner et al. 
[44] formulated the SSP as a bilevel programming problem, where the upper- 
level part represents the public network manager and the lower-level part 
represents users’ behavior. 
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Some other applications of multilevel optimization can be found in Mig- 
dalas [71], Migdalas et al. [72], and Bard [11]. 


d 


6.7 Some Interesting Open Problems 


6.7.1 Polynomially Solvable Problems 


As we mentioned before, the linear bilevel programming problem is known to 
be NP-hard. However, if the lower-level problem has a constant number of 
variables, the linear bilevel problem is polynomially solvable. Hence, a similar 
question may arise: whether we can find a polynomial-time algorithm for the 
linear bilevel problem in which the upper-level problem has a constant num- 
ber of variables. Therefore, it would be interesting if we can find polynomially 
solvable special cases of the nonlinear bilevel programming problem. 


6.7.2 The Relation Between Multilevel Programming 
and Multicriteria Programming 


In Section 6.4, we discussed a bicriteria approach by Fliege and Vicente [39] 
for solving bilevel problems. They introduced an order in the Euclidean space 
and showed that an optimal solution of the bilevel problem is a nondomi- 
nated point with respect to the order. However, they did not exactly define 
any multicriteria problem. The problem of finding equivalent multicriteria 
programs of multilevel programs is still an open question. As we mentioned 
before, several attempts, including Bard [7] and Unlii [87], have been made 
to discover the relation between bilevel linear programming and multicrite- 
ria programming. Later, Wen and Hsu [91] pointed out that the theorem in 
[87], which illustrates the relation between multilevel programming and mul- 
ticriteria programming, is valid only under an additional constraint. Let us 
consider the linear bilevel programming problem 


min F(2,y)=cla+dly (6.12) 


where y solves 


min f(z,z) =cPa+dfz 
z 

st. Avn+ Bze<r<0. 

xéER”, y,z€R” 


and the multicriteria programming problem 
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min (cle + diy, df z) (6.13) 
st. Av+ Bz<r. 


Then, Wen and Hsu stated the following theorem. 


Theorem 6.3. If VF'Vf > 0, where f = f(0,y), then the optimal solu- 
tion to the bilevel programming problem (6.12) is an efficient solution to the 
bicriteria programming problem (6.13). 


The condition in the above theorem restricts the use of bicriteria approaches 
to the linear bilevel problem. Thus, it would be useful if we can derive an 
equivalent multicriteria programming problem, which may have three or more 
objective functions, for the billevel problem (6.12) without any strong con- 
ditions. Therefore, it might be interesting if we state special cases of the 
nonlinear multilevel programming problem, which are equivalent to multicri- 
teria programming problems. 


6.7.3 Multilevel Multicriteria Programming Problems 


Most of the real-world optimization problems require more than one objec- 
tive function. Therefore, one of the challenging problems in mathematical 
programming is the multilevel multicriteria programming problem; that is, 
each decision maker has several objectives. The general bilevel multicriteria 
programming problem has the form 


min F(z,y) _ (Fi(z,y),- a , F(x, y)) (6.14) 


where y is Pareto optimal for 


min f(z, 2) = (f(z, 2), oa » fez, z)) 
s.t. g(a, z) <0. 
cE xX CR", y,z2€Y CR”, 


where X C R” and Y C R™ are compact sets, G : R"” x R™ > R?’,g: 
R” x R™ > RY, and Fj, fj : RR” x R” — R,i=1,...,l and j =1,...,k, are 
scalar functions. 
Definition 6.6. A point y* € Y with f(z, y) is called Pareto optimal for the 
lower-level multicriteria problem, if and only if there exists no point z € Y 
such that f;(x,z) < fi(y) for alli =1,2,...,k and f;(x,z) < f;(x,y) for at 
least one index j € {1,2,...,k}. 

Solving the bilevel multicriteria problem (6.14) means we look for Pareto 
optimal soltutions of the problem (6.14): 
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Definition 6.7. A point (2*,y*) © X x Y with F(a*,y*), where y* is Pareto 
optimal for the lower-level problem, is called Pareto optimal for (6.14), if and 
only if there exists no point (x,y) € X x Y, where y is Pareto optimal for 
the lower-level problem, such that F;(a) < F;(a*) for all? = 1,2,...,/ and 
Fj (x,y) < Fj(x*,y*) for at least one index 7 € {1,2,...,/}. 


To our knowledge, this problem has not been studied by many researchers. 
Recently, Sinha and Sinha [81] studied multilevel decentralized programming 
problems in which decision makers have absolute control over certain deci- 
sion variables but some variables may be controlled by two or more decision 
makers. In this case, the problem has conflicting objectives, but the decision 
makers are placed in hierarchical order. 

We have briefly discussed optimality conditions for multilevel program- 
ming in Section 6.3. Many researchers also considered optimality conditions 
and duality results for multicriteria programming problems including Preda 
[78], Jeyakumar and Mond [57], Liang et al. [64], and Chinchuluun et al. [28]. 
Based on the optimality conditions and duality results of both multicriteria 
programming and multilevel programming, optimality duality of multilevel 
multiobjective programming can also be studied. 


6.8 Concluding Remarks 


In this brief survey, we have shown a number of theoretical results for de- 
terministic multilevel programming. These results include complexity issues, 
optimality conditions, and algorithmic approaches for solving multilevel pro- 
gramming problems. We have also presented a method, which is called the 
multivariate partition approach, for solving single-level mathematical pro- 
gramming problems based on their equivalent bilevel programming formula- 
tions. Some open questions have also been included at the end of the chapter. 
This survey is not comprehensive; we have not focused on stochastic issues 
and connection with decomposition methods. 
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Chapter 7 


Central Path Curvature and 
Iteration-Complexity for Redundant 
Klee—Minty Cubes 


Antoine Deza, Tamas Terlaky, and Yuriy Zinchenko 


Summary. We consider a family of linear optimization problems over the 
n-dimensional Klee—Minty cube and show that the central path may visit 
all of its vertices in the same order as simplex methods do. This is achieved 
by carefully adding an exponential number of redundant constraints that 
forces the central path to take at least 2” — 2 sharp turns. This fact sug- 
gests that any feasible path-following interior-point method will take at least 
O(2”) iterations to solve this problem, whereas in practice typically only a 
few iterations (e.g., 50) suffices to obtain a high-quality solution. Thus, the 
construction potentially exhibits the worst-case iteration-complexity known 
to date which almost matches the theoretical iteration-complexity bound for 
this type of methods. In addition, this construction gives a counterexample 
to a conjecture that the total central path curvature is O(n). 


Key words: Linear programming, central path, interior-point methods, to- 
tal curvature 


7.1 Introduction 


Consider the following linear programming problem: min c’ x such that Ax > 
b where A € R™*", bE R™, and c,x € R”. 

In theory, the so-called feasible path-following interior-point methods ex- 
hibit polynomial iteration-complexity: starting at a point on the central path 
they take at most O(,/mlInv) iterations to attain a v-relative decrease in 
the duality gap. Moreover, if L is the bit-length of the input data, it takes 
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at most O(,/mL) iterations to solve the problem exactly; see, for instance, 
[11]. However, in practice typically only a few iterations, usually less than 50, 
suffices to obtain a high-quality solution. This remarkable difference stands 
behind the tremendous success of interior-point methods in applications. 

Let w : [a,8] — R” be a C? map with nonzero derivative Vt € [a, 8]. 
Denote its arc length by 


I(t) := / Ib(r) lar, 


its parametrization by the arc length by ware(I) : [0,1(G)]| — R"”, and its 
curvature at the point J, 


K(1) = “bare(l) 


The total curvature K is defined as 


1(B) 
K =| \|ac(2) ||. 


Intuitively, the total curvature is a measure of how far off a certain curve 
is from being a straight line. Thus, it has been hypothesized that the total 
curvature of the central path is positively correlated with the number of 
iterations that any Newton-like path following method will take to traverse 
this curve, in particular, the number of iterations for feasible path-following 
interior-point methods, for example, long-step or predictor-corrector. 

The worst-case behavior for path-following interior-point methods has al- 
ready been under investigation, for example, Todd and Ye [13] gave a lower 
iteration-complexity bound of order ~/m necessary to guarantee a fixed de- 
crease in the central path parameter and consequently in the duality gap. 
At the same time, different notions for the curvature of the central path 
have been examined. The relationship between the number of approximately 
straight segments of the central path introduced by Vavasis and Ye [14] and 
a certain curvature measure of the central path introduced by Sonnevend, 
Stoer, and Zhao [12] and further analyzed in [15], was further studied by 
Monteiro and Tsuchiya in [9]. Dedieu, Malajovich, and Shub [1] investigated a 
properly averaged total curvature of the central path. Nesterov and Todd [10] 
studied the Riemannian curvature of the central path in particular relevant 
to the so-called short-step methods. We follow a constructive approach orig- 
inated in [4, 5] which is driven by the geometrical properties of the central 
path to address these questions. 

We consider a family of linear optimization problems over the n-dimensional 
Klee—Minty cube and show that the central path may visit all of its vertices 
in the same order as simplex methods do. This is achieved by carefully adding 
an exponential number of redundant constraints that forces the central path 
to take at least 2” — 2 sharp turns. We derive explicit formulae for the num- 
ber of the redundant constraints needed. In particular, we give a bound of 
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O(n22”) on the number of redundant constraints when the distances to those 
are chosen uniformly. When these distances are chosen to decay geometrically, 
we give a slightly tighter bound of the same order n°2?” as in [5]. 

The behavior of the central path suggests that any feasible path-following 
interior-point method will take at least order 2” iterations to solve this prob- 
lem. Thus, the construction potentially exhibits the worst-case iteration- 
complexity known to date which almost matches the theoretical iteration- 
complexity bound for this type of methods. However, state-of-the art linear 
optimization solvers that include preprocessing of the problem as described 
in [6, 7] are expected to recognize and remove the redundant constraints in no 
more than two passes. This underlines the importance of the implementation 
of efficient preprocessing algorithms. 

We show that the total curvature of the central path for the construction 
is at least exponential in n and, therefore, provides a counterexample to a 
conjecture of Dedieu and Shub [2] that it can be bounded by O(n). Also, 
the construction may serve as an example where one can relate the total 
curvature and the number of iterations almost exactly. 

The chapter is organized as follows. In Section 7.2 we introduce a family of 
linear programming problems studied along with a set of sufficient conditions 
that ensure the desired behavior for the central path and give a lower bound 
on the total curvature of the central path, in Section 7.3 we outline the 
approach to determine the number of the redundant constraints required, and 
Sections 7.4 and 7.5 contain a detailed analysis of the two distinct models 
for the distances to the redundant constraints. We give a brief conclusion in 
Section 7.6. 


7.2 Sufficient Conditions for Bending the Central Path 
and the Total Curvature 


Let x € R”. Consider the following optimization problem. 


min Zp 
0 < v1 < 1 
EXp-1 < Le <1-—exp-1 | ee) 
0 < dy+2, repeated hy, times 
EX, < dot+22 repeated hz times 
E@n—-1 < dnt+2y repeated h,, times. 


The feasible region is the Klee—Minty n-cube and is denoted by C C R”. 
Denote d := (d1,...,dn) € Ri} — the vector containing the distances to the 
redundant constraints from C, h := (h1,...,hn) € N” — the vector containing 
the number of the redundant constraints. 
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By analogy with the unit cube [0, 1]”, we denote the vertices of C as follows. 
For Sc {1,...,n}, a vertex v* of C satisfies 


s 1 ifle S 
0 otherwise 


k=2 


prey 


s l-ev, ifkes 
EUp_4 otherwise 


Define 6-neighborhood N5(v°) of a vertex v°, with the convention xo = 0, 
by 


eee k-1 
Ns(u8) = {wees 4) Bets ee, ERE k=1,....n}. 


Le — EXE. < EP 16 otherwise 


Remark 7.1. Observe that VS C {1,...,n} for Ns(v%) to be pairwise- 
disjoint it suffices ¢ + 6 < 1/2: given c,d > 0, the shortest amongst all n 
coordinates’ distance between the neighborhoods, equal to (1 — 2¢ — 2¢0), is 
attained along the second coordinate and must be positive, which is readily 
implied. 


For brevity of the notation we introduce slack variables corresponding to 
the constraints in the problem above as follows: 


8, = 

Sk = Lp — EXK-1 k= 2,045.0 
5, =1-2 

Se =1—exp_-1 —2p4 k=2,...,n 
8, =da,4+2, 

8, = dy + (an — €Xn-1) k=2,...,n. 


Recall that the analytic center y corresponds to the unique maximizer 


arg max }° (In 8, + Ing; + hj ln §;). 
i=l 


Also, recall that the primal central path P can be characterized as the closure 
of the set of maximizers 


Biin=a + 
w=1 


{2 €R":x2=arg max Son si +Ins;+h,Ins;), for some a € onr0| : 


Therefore, setting to 0 the derivatives of }>""_, (Ins; + Ing; + Ajln3;) with 
respect to Zn, 


—-—4+—=0, (7.1) 


and with respect to xz, 
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ee er oe 


Sk Sk41 Sk Sk41 Sk Sk41 


h 
AG Pet ne— ty 17) 


combined give us necessary and sufficient conditions for x = x. Further- 
more, (7.2) combined with x, = a € (0, xn) gives us necessary and sufficient 
conditions for x € P \ ({0} U{x}) where 0 € R” denotes the origin. 

Given ¢,6 > 0, the sufficient conditions for h = h(d,¢,6) to guarantee that 
the central path P visits the (disjoint) d-neighborhoods of each vertex of C 
may be summarized in the following proposition. We write 1 for the vector 
of all ones in R”. 


Proposition 7.1. Fir ¢,d > 0. Denote for k = 2,...,n 
i == {weEC:5,> eh se > er ot 


and 
Bers ie er tay <6" "6, 8p.e Se” 0,0 8 SO}: 


Ifh = h(d,e,0) € N” satisfies 


Ah > “1 (7.3) 
and 
hr, k-1 Agava k 3 
> | a ee —1 7.4 
dr ue af — Feats ar ’ R 5 x ( ) 
where 
res = 0 0 0 0 
2 Layee 
i al a 0 0 0 
zy 1 gek-1 __k 
A + QO es — = 0 
_ a a ee 
a 8 a 
os n-1 
1 9 0 0 i 
then 
DAP CBE. 


Proof. Fix k > 2 and let c ET? NP. 

Let 7 < & — 2. Summing up all of the ith equations of (7.2) over 4 = 
j,..-,(k — 2), each multiplied by e’~!, and then subtracting the (k — 1) st 
equation multiplied by e*~?, we have 
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—— + + +— 
Sk-1 8j 5k $j Sk Sk 
Qek—2 ci—1 k—-3 ef 
= +25°- 
Sk— Si 
k-1 i] i=j i+1 


Because 5,1 < dy_1 + 1, 5, > dr, 8; > d;, and s; > ek 16, Sp = ck-16§ as 
x €T*, from the above we get 

2hy-ie*-?—hye71—hge*1 e122 

dy—-y +1 d; dy, 85 5 


From (7.4) it follows that (hi/d) > (hje2~*/d;), thus we can write 


hy Qhp_1e"—? hpe®! eJ-1 2 
dy dp—1 +1 dr Sj 6” 


that is, as (3/5) < —(hi/di) + ((2he-1e*~7) / (dx-1 + 1)) — (Ree*-*/dx) 
by (7.3), we have 


spsel "i, Vi <k-2. 


In turn, the (k — 1) st equation of (7.2), 


hp_1e* 2 hye* 1 k-2 k-1 k-1 k-2 


Sk—-1 Sk Sk-1 Sk Sk Sk-1 
implies 
hy_e®-? Z Apert ek-2 ‘ 2 
dy-1+1 dy Sp-1—O8 
and since (3/5) < ((hx—1e"—?) / (dk—1 + 1)) — (hee*1/dx) by (7.4), we have 
Sp—-1 S ek 26, 


Proposition 7.2. Fix ¢,d > 0. If h € N” satisfies (7.3) and (7.4), then 
x EN5(vt"?), 


Proof. Summing up all of the ith equations of (7.2) over i =k,...,(n—1), 


each multiplied by e+, and then subtracting (7.1) multiplied by e"~1, we 
have 
k-1 k-1 k-1 n—-1 n-2 4 n—-1 
E E hee 2€ € 2hn€ 
ee ee n= __ = 
Sk Sk Sk Sn py Dit Sn 
implying 
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Because 8, < dn +1, 54 > dy and by (7.4), (hi/di) > (hee*~+/d,), from the 
above we get 


combined with ((2hne"~") / (dn + 1)) — (hi /di) > (3/5) (from (7.3)) this 
leads to 


In turn, (7.1) implies (hne”~1/8,) < (€"~1/3n). And because 8, < dy +1 
and by (7.4), (hné"~'/ (dn + 1)) > (3/6), we have (3/4) < (e"~1/3n); that 
is; By = le */4) 0. 


Corollary 7.1. Fix e,d > 0 such thate +6 < 1/2. Ifh © N” satisfies (7.3) 
and (7.4), then the central path P intersects the disjoint d-neighborhoods of 
all the vertices of C. Moreover, P is confined to a polyhedral tube defined by 


= (Maas 8") n Br) with the convention B} = C. 


Remark 7.2. Observe that Jo is the sequence of connected edges of C start- 
ing from vi"! and terminating at v®, and is precisely the path followed by 
the simplex method on the original Klee—Minty problem as it pivots along 
the edges of C. 


For simplicity of the notation we write J instead of J; when the choice 
of 6 is clear. For a fixed 6, we define a turn of T adjacent to a vertex v°, or 
corresponding to N5(v%) if the 6-neighborhoods are disjoint because in the 
latter case N5(v°) determines v* uniquely, to be the angle between the two 
edges of C that belong to Jp and connect at this vertex. 

Intuitively, if a smooth curve is confined to a narrow tube that makes a 
sharp turn, then the curve itself must at least make a similar turn and thus 
have a total curvature bounded away from zero. It might be worthwhile to 
substantiate this intuition with a proposition. 


Proposition 7.3. Let VW : [0,T] + R? be C?, parameterized by its arc length 
t, such that Y((0,T]) C {(z,y):0<a<atb0<y<b}U{(ay):a<a< 
a+b,-—a < y < b} and (0) € {0} x [0,0], (LT) € [a,a + b] x {—a}. Then 
the total curvature K of W satisfies K > arcsin (1 — 2b?/a?). 


Proof. By the mean-value theorem, for any 7 such that %(7T) = a we have rt > 
a; recall that ||¥|| = 1. Thus, by the same theorem, St; such that |Wo(t,)| < 
b/a. Similarly, St2 such that |W%(t2)| < b6/a. Now map the values of the 
derivative of W at t; and tg onto a sphere and recall that the total curvature 
K between these two points corresponds to the length of a connecting curve 
on the sphere, thus bounded below by the length of the geodesic (which in 
this case is the same as the angular distance). 
A simple calculation completes the proof. 
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Wo Do 


W(t2) 


Fig. 7.1 Total curvature and geodesics. 


Remark 7.3. Note that if b/a — 0, then the corresponding lower bound on 
the total curvature K approaches 7/2. 


Next we construct a simple bound on the total curvature of P by picking 
suitable d,¢, and finally 6 small enough, together with h, that results in a 
“narrow” polyhedral tube T. 

For X C R” denote its orthogonal projection onto a linear subspace 
spanned by a subset S C {1,...,n} of coordinates, with coordinates cor- 
responding to S° suppressed, by Xg. For 2,z € R” we denote (x, z) the 
straight line segment connecting the point x and z. 


Corollary 7.2. Fiz n > 2. If dj = (n —1)2"-**?, i=1,...,n, 
ee n—-1 
a ae 
fo. 1 4 n—2 
~ 32n? \5 
and h satisfies 
5 - 3~ 
k= 1+ max > lass 5? | 


where Ah = 1, then the total curvature of the central path P satisfies 


1 8 n—2 
K>—|- ; 
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X3 
P 
yi 1 
{2.3} 
3} 
v {1.2.3} (2h 
ve 1 
X2 
pital 
1 at 
(vl) = (vl? YY ey 2} 
X1 
- Xe ah 
<t (v6123}) 65 oy 
> 1 
Pxa,2} 
AZ PB) amy 
a lita 


+ > 
5 1 Xy1 


(v Jaray = (vt) 1 9} 


Fig. 7.2 Planar projection of the central path for n = 3. 


Proof. That ¢,6,h = h(d,¢,5) above satisfies the conditions of Corollary 7.1 
and thus P is confined to the polyhedral tube T is established in Section 7.5. 

Instead of analyzing P € R” directly we derive the lower bound on the 
total curvature of P based on its planar projection P,, 9}. 

From ZF P Cc BE, k = 2,...,n, it follows that Pyi.2} will tra- 
verse the two-dimensional Klee—Minty cube C,;,9} at least 2”"-2 times, ev- 
ery time originating in either N5(v7 ) 1,2} or No5(vt?4) 11 9y and terminating 
in the other neighborhood, while confined to the polyhedral tube 712; = 
({s2 < ed} U {81 < 0} U {52 < €d}) NCp1 2). Thus, Ps1,2} will make at least 
2”-1 “sharp turns”, each corresponding to a turn in Nj(ut4?}) 5) or 
Nout?) 41 ay. 
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In order to understand how the turns of P,;,2; contribute to the total 
curvature of P we need the following lemma. 
Lemma 7.1. Let %,0 € R® andu= (t41,23,0), v = (Uf1,2},0). If the angle 
€:=m—arccosarg min tw 
wEspan{u,v}, 
w€span{u,v}, 
||@||=[]|]|=1 


between the hyperplane spanned by u,v and the hyperplane spanned by u,v 
does not exceed arcsine, then the angle @ between U and U satisfies 


e2 ( 1l—cos a 


cos @ — cos a + e? (+#998@) 


1 of e2 (= ose) 


< cosa < 


where a is the angle between u and v. 


Proof. Without loss of generality we may assume ||u|| = ||v|| = 1 with 


u, = sin $ j/i— COS /1i— cosa 
ug = cos $ / Ligosa ,u= cos $ ACOSO 


and, assuming that the angle ¢€ is precisely arcsine, parameterize span{u, Uv} 
by span{u,v} and z = (21, z2,0) such that ||z|| = 1, writing x € span{w, 0} 
as © = (X11, 22, Fy 9) 241,2}€)- 

Introducing 6 such that z; = cos 8 and z2 = sin G we have 


1—cosa tose 
(R= ee co(a~-$+$)), 

a 1—cosa 1+cosa 
i= (- yc ee <(0-$+9)), 


and, therefore, 


Tu 
2 


Fal” ip Peat (8 $4 8) 1 teow (B— 


Denoting y := 6 — (7/2) and differentiating the above with respect to y we 
get 

~, — (1+e?)(—32e? sin 2y + 16? sin(2y + 2a) + 16e? sin(27 — 2a)) 

(cos @),, = a 


where 
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D = (16 + 8c? cos(2y — a) + 16? + 8c? cos(2y + a) + 2e* cos 2a 
+ 2e* cos 4y + 4e* cos(27 + a) + 4e* cos(2y — a) + de*)?/? 


Setting the derivative to 0 and simplifying the numerator we obtain the nec- 
essary condition for the extremum of cos @, 


32e7(1 + e”) sin 27(cos 2a — 1) = 0. 


That is, y = k (a/2) for k = 0, +1, +2, and so on. In particular, it follows that 
the minimum of cos @ is attained at Bmin = 0 and the maximum is attained 
at Bmax = 7/2. The bounds are obtained by further substituting the critical 
values of 3 into the expression for cos @ and observing the monotonicity with 
respect to €. 


Although the full-dimensional tube T might make quite wide turns, the 
projected tube 7,;2} is bound to make the same sharp turn equal to 


((7/2) + arcsin(e/V1 + €?)) each time T passes through the 6-neighborhood 
of a vertex v°, 1 € S (e.g., consider the turn adjacent to vt! for n = 3). 

For a moment, equip C and J with a superscript 7 to indicate the di- 
mension of the cube, that is, the largest number of linearly independent vec- 
tors in span({v* : v5 € C"}). Recalling the C” defining constraints, namely 
ELn—-1 < Uy < 1—€2%,_1, we note that by construction of the Klee—Minty 
cube, whenever we increase the dimension from 7 to 7 +1, C” is affinely 
transformed into “top” and “bottom” n-dimensional faces Ft! and Ft, 
of C”*!; that: is, 


= I a 
n+1 _ n 
Patton = Cae ’ 


where I is the identity 7 x n matrix, and C"** is the convex hull of F/°*" and 


Fett jy; Consequently, any two-dimensional space spanned by two connected 


edges of C"+! from Ty'*? N Fib* or Ty’ O Fiicttom is obtained by tilting 
the two-dimensional space spanned by the two corresponding edges of C” 
from 7,”, lifted to R"*! by setting the (n + 1) st coordinate to zero, by an 
angle not exceeding arcsin (e /V1+ e), and moreover, not exceeding arcsin é. 
Therefore, we are in position to apply Lemma 7.1 to bound how fast the cosine 
of a turn a of T” adjacent to any v° € C” with 1 € S may approach its two 
boundary values of 1 or —1 by induction on the dimension n. 

Fixing n = 3, S C {1,2,3} such that 1 € S, adding and subtracting 1 to 
cos a> we get 
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1+ cos a®t1.23 


2 —cos @ ie 
Il+e (ang et ) 


and 
1 — cosa®t1.23 


I ; 
2 { lt+cos a” {1,2} 
14 62 (Hissg Ce) 


Furthermore, for any n > 3 and v* with 1 € S we can write 


1 —cosa® > 


1+ cos a9t1.2} _ l-e 
(i-ey* ~ d+ 
1 — cos aSt1.23 =. + 1+ (2e/v5) 
ate? ~ Gta?’ 


S 
1+cosa? > ya? 


1—cosa® > 


recalling —2e/./5 > cos a9(1.2} = —e//1 + €? > —e because € > 2. 
Observe that by construction of a polyhedral tube 7, a single linearly 


connected component of T \ (Uscea.on} No(v8)) may be uniquely iden- 


tified with an edge (v®,v°), R,S C {1,...,n}, of C from Jo by having 
a nonempty intersection with this component and thus we denote such a 
component by L,#,,s) and refer to it as a section of T corresponding to 
(v®, v*). Moreover, recalling the definition of Ns(v°) and T, and noting that 

0? + (66)? + (ed)8 +--+ < d+ e6 +--+ < 26 because « < 1/2, we get 
that within a given section of a tube L(,z,,s) the Euclidean distance from 
Va € Liyr.ys) to the compact closure of (vF, v8) Liye ys) is bounded from 
above by 26. 

Let us consider what happens to the central path in the proximity of a 
vertex v° € C such that 1 € S. We do so by manufacturing a surrogate for a 
part of T that is easier to analyze. 

Fix v° € C with 1 € S and denote the two adjacent vertices to which v* 
connected by the two edges from Jo by v® and v@. Without loss of generality 
we may assume that 


ufi oy = (0, 1), 
UF 2; = (1,1 —-€), 
Urq 9} _ tLe) 


and v? > v> > v2, so that the central path P enters the part of the polyhe- 
dral tube T sectioned between these three vertices via Ns(v) and exits via 
N5(v®@). 

Define four auxiliary points FZ € (v®,v*) and a2, z € (v°, v®) satisfying 


Za} = (1— 36,1 —e + 306) + “EF (-1,6), 
Zi} = = (1 — eee l—e+ 3€0), 

£419} = (Ll—2—230), 
Z£41,2} = ( 7a) 
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Fig. 7.3 Schematic drawing for the cylindrical tube segments. 


Because the distance from any point to the (part of the) identifying edge 
of Liyrys) or Lys ,) is no greater than 26 and because (-) 1,2} corresponds 
to the orthogonal projection from R” onto its first two coordinates, we can 
define two cylindrical tube segments: 


T :={z €R”: mines y ||z — z|| < 26} 
N{x@ ER": (e-Z) 2 < (E-Zz)Tz} 
N{x@ ER": (z-B) 2 < (2-2) Zz} 
and 
T :={z € R”: minze(z,2) ||z — z|| < 26} 
N{z@ ER": (x— 2)? 2 < (x -z)? x} 
{oe €R": (g—2)' 2 < (zg -—2z)T 2} 
such that 


T D Lrys) O{x ER”: (E-2)' x < (@—2)'F, (Z—-F)" 2 < (F-7)" Fh, 


T > Les yo) 1 {x € RR": (x — 2)" 2 < (zx — 2)" 2, (z-2)* 2 < (z-2)*Z}, 


and 
T 1 (Ns(v®) U Ns(v®)) = IN (No(v%) U No(v®)) = @. 


Therefore, P will traverse T and T, first entering T through its face cor- 
responding to (¥ — Z)’x = (¥ — Z)'Z and exiting through the face cor- 
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responding to (2 — Z)’a = (z— 2)", and then entering 7 at a point with 
(2—z)? x = (x—z)? x and exiting through a point with (z—2x)? 2 = (z—-z)"z. 

Now we choose a new system of orthogonal coordinates in R” that allows 
us apply the argument similar to that of Proposition 7.3 as follows. Let the 
first two coordinates correspond to the linear subspace spanned by (%, Z) and 
(x,z); align the second coordinate axis with the vector (z,z), so that the 
vector (%,Z) forms the same angle equal to a@° with the second coordinate 
axis as with (x, z). Choose the rest (n — 2) coordinates so that they form an 
orthogonal basis for R”. 

Consider parameterization of P by its arc length, Par-. Because the 
shortest distance between the two parallel faces of TJ that correspond to 
{x ER”: (§-—Z)'a = (E— Z)'T} and {x € R" : (Z-Z)' 2 = (Z-7F)'Z} 
is equal to ||(%,Z)|| = 1/2 — « — 3d, by the mean-value theorem it takes at 
least (1/2—e¢-—36) change of the arc length parameter for Par. to traverse T. 
Noting that while traversing the tube T the second coordinate of Pare might 
change at most by 2-|26 sina*|+|(1/2—e—36) cosa], by the same theorem 
we deduce that St, such that 


2|25 sin aS | + |(1/2 — e — 36) cosa’ | 
< 
| (Pare(ts)) a (235 
< |cosa‘5| + a 
~ 1/2 -—e- 36° 


Analogously, considering J along the ith coordinate with 7 4 2 we conclude 
that Vi € 2, St; such that 


. 46 
Pili ) Se, 
(Pact) |< Tea eTaG 
We use the points t1, t2,...,t, to compute a lower bound on the total cur- 


vature contribution of a turn of P next to v°: recalling ||Pare|| = 1, the total 
curvature of the part of P that passes through 7 and Z (i.e., resulting from a 
turn of T adjacent to v°) may be bounded below by the length of the shortest 
curve on a unit n-sphere that connects points Pare(t1), Pare(t2), - «+, Pare(tn) 
in any order. For simplicity, the latter length may be further bounded below 
by 


Kes min max dist (x, x’) 
ER", i=1,...,.n: j22 
je*||=1, V4, 


a} |<| cosa |+p542-5,, 
wil<qattigs, 522 
min max \|x* — «II, 
v’ER”, i=1,...,n: j22 
|e" I=1, 
xy\<| cos a |+ z7542-55, 
wi\<iececae 522 


IV 
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where dist(x, z) is the length of the shortest curve on a unit sphere between 
points x and z, that is, the geodesic. Clearly, the critical value for the last 


expression is attained, in particular, at x? € R%, Vi, when ||a+|| = 1, ||2t — 
a) || = ||zx* — 2" ||, i,j = 2, and 
46 46 
1 Ss je 
xy = | cosa”| + ————_., c= ———__., > 2. 
1 | I+ yea 3s i ae ee 


It follows that 


n 


46 ° Ad 
1y2_ 4 ee) = eS 
y (aj)° =1 (Icosa +o) >1—|cosa”| a—e = 38 


j=2 
and, because |cosa$| < 1 — ((1 =€)/ (1 a ey), 


l-e 46 


142 
Ly2 
>) <— (+222 1/26 —30’ 


J 


resulting in 


8 
= 
IV 
— 
8 
an 
NS 
bo 
| 


1 Le 46 
m—1\(l+e7)r-2 1/2—e-36 


= 1 1 1 46 oc 
<—n—-1 2041/42 n—-1\1/f2—-e-35)’ 7-* 
Therefore, recalling ¢ = (n — 1) /2n and 6 = (1/32n?) (4/5)” 7, we can write 


KS > |e’ — 2? | 


> 13-25 

ee Ay? ia if 46 

= 9(n—1) \5 noi) i2=e—36 

ae ()" n (2) 1 

a _ 2 _ . n—-2 
2(n—1) \5 8n2(n — 1) \5 J ae (2) 

7 1 4 n—2 ry 4 n-2 56 

—~ 2(n—1) \5 8n2(n— 1) \5 


1 4 n—2 
>—{= : 
~ 4n \5 


Finally, recalling that the polyhedral tube JT makes 2”~! such turns, we con- 
clude that the total curvature of P indeed satisfies K > (1/2n) (8/5)"~”. 
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The bound on the total curvature K of P established above is obviously 
not tight. We expect the true order of K to be 2” up to a multiplier, rational 
inn. 


Remark 7.4. In R?, by combining the optimality conditions (7.1) and (7.2) 
for the analytic center y with that of the central path P visiting the 6- 
neighborhoods of the vertices vt} and v!?! one can show that for 6 below a 
certain threshold both d; and dz are bounded away from 0 by a constant. In 
turn, this implies that for fixed feasible d;,d2, the necessary conditions (7.1) 
and (7.2) for h chosen such that the central path visits the d-neighborhoods 
of all the vertices of C are “asymptotically equivalent” as 6 | 0 to the suffi- 
cient conditions (7.3) and (7.4), up to a constant multiplier. Here the term 
asymptotic equivalence refers to the convergence of the normalized extreme 
rays and the vertices of the unbounded polyhedra given by the set of nec- 
essary conditions for a fixed d to those of the polyhedra given by the set of 
sufficient conditions (7.3) and (7.4). 

This suggests that the following might be true. In R” 
juin di > d> 0, 
where d is independent of n,6,¢. Moreover, the necessary conditions for P 
to visit the 6-neighborhoods of all the vertices of C for a fixed d are asymp- 
totically equivalent as 5 | 0 to the sufficient conditions (7.3) and (7.4). If, 
furthermore, we confine ourselves to only bounded subsets of all such feasible 
(d, h) corresponding to, say, 


n n 
dh < Hi := a 
ad = 


then the conditions (7.3) and (7.4) are tight, in a sense that if we denote 
the set of all (d,h) satisfying the necessary conditions for P to visit all the 
d-neighborhoods intersected with {h : }0,h < Hs} as Necc;, the set of all 
(d,h) satisfying (7.3) and (7.4) intersected with {h : }0,h < Hj} as Suffs, 
then for some small enough 6 there exists M ,m > 0 independent of ¢ such 
that i 

Necems C Suffs C Neccays, 0<d<06. 


7.3 Finding h € N” Satisfying Conditions (7.3) 
and (7.4) 


We write f(n) © g(n) for f(n), g(n) : N > R if dc, C > 0 such that cf(n) < 
g(n) < Cf(n), Vn; the argument n is usually omitted from the notation. 
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Denote b := 1(3/6). Let us first concentrate on finding h € N” such 
that (7.3) holds. If the integrality condition on h is relaxed, a solution to (7.3) 
can be found by simply solving Ah = 6. Note that 


nm 
| All1,co = max ) | |ais| 
j=l 


is, in fact, small for d — large componentwise and « < 1/2. So to find an 
integral h we can 

e Solve Ah = (1+7)b for some small y > 0. 

e Seth=|h. 

Observe that for h to satisfy (7.3), it is enough to require max;(A(h —h)- 
yb); <0. In turn, this can be satisfied by choosing y > 0 such that 


3 n 
1G > max ) | |aiy|. 


j=l 


In Section 7.3.1 we show how to solve this system of linear equations. 
In Section 7.3.2 we demonstrate that under some assumption on d, (7.4) is 
already implied by (7.3), and consequently the rounding of h will not cause 
a problem for (7.4) either. 

Remark 7.5. The choice of rounding down instead of rounding up is arbi- 
trary. 


7.3.1 Solving the Linear System 


Because (1 + 7)b = (1 +7) (3/6) 1, we can first solve Ah = 1 and then scale 
h by (1 +) (3/6). Our current goal is to find the solution to Ah = 1. 

For an arbitrary invertible B € R”*” and y,z € R” such that 1 + 
z’ B-ly £0, the solution to 


(B+ y2")2 =b 


can be written as 
r= DS ea ~~ ay), 


where 
eR) 
ve ae zT Bly 
(for writing (B + yz?)x = Bx +y(z?x) = b, denoting a := 27x, we can 


express x = B~!(b— ay) and substitute this x into (B + yz7)x = b again to 
compute a). 
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Denoting 
A ace 
a oa 2: O  -: 0 0 
2 = 
OC ota Uf 0 0 
: : . : a : : 
B= 0 QO -+) 2 ss eis 0 

0 0 Qer—-2 gn 

dyn—itl my 
ens 
0 0 O 0 i 


“4 
Tr 

= —(0,1,1,...,1), 
y a 5 ) 


z? := (1,0,0,...,0), 
we can compute the solution to (B + yz" \h = Ah=1as 
0 


hep 2%e3| | (7.5) 
di a6 


where 


n —1 
eee ey, 
_ 1 n —1° 
ies dy ss. By; 


a 


(7.6) 


So to get the explicit formula for h we need to compute B~+ and show that d 
can be chosen such that a is well defined; that is, 1 — (1/d1) je By; #0. 
In order to invert B first note that it satisfies 


1 Qe Qe? Qer—! 


B=Di —_, ——_, ——_...., ——— 
ise (oe "dy +1 


) (f+ 8), 
where a superdiagonal matrix S € R"*” is such that 


—e(d;+1) $s oe _ 

a ane j=titilz=l1,...,.n-1 
0 otherwise. 

Recall that (I+ Z)"-1 =I -Z4+Z?-— Z3 +... for any Z € R™™” such that 

these matrix-power series converge. In our case, the powers of S are easy to 

compute forl<k<n-—1, 
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k FO es j=itki=l,....n—-k 


otherwise, 
and S™” = 0 for all m > n, so the inverse of (J + S') can be computed as 


above and the inverse of B can be further computed by post-multiplying by 


the inverse of 
: 1 Qe Qe? Qer—1 
Diag ; ; eos : 
dy +1? dg+1°d3+1 dn +1 


Therefore, B~! is equal to 


d, + 1 (h+DGdetl) (ditt) (dati)(dst) (ditt)-(dat1) Tfai1(45+1) 
1 2d2 4dad3 8d2d3da4 ease 
0 dot (da+1)(d3+1) (do+1)--(dat1) [ans +1) 
2€ Ad3e 8d3d4e gn—le ITj-3 d; 

0 0 dg+1 (dsti)(dat1) ... _Tj=3(4i+4) 
Qe? Ade? an— 262 TF 4 dy 

TH F4(45 +1) 
0 0 0 dott Soe 
Qe 2n—Se yes 45 

0 0 0 0 oot 


7.3.2 Partial Implication for Sufficient Conditions 


Observe that in order for h € R', we must have a > 0 as in (7.6). For if not 
(i.e., if a < 0), then denoting 


0 
fa |! at 
By := 2B Pe lie SB; 
: j= 
1 
and writing 
Bitdy+1 
le or vet) 
dy 
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and from (7.5) it follows that if (a/d,) < —1, then hz,h3,...,hn <0. 

From now on we assume a > 0 (in Sections 7.4.1, 7.5.1 we show how to 
achieve this by choosing d appropriately). Note that in this case (a/d,) > 1. 

Suppose h € N” is such that (7.3) holds. If, furthermore, (hje*~') /d; 
is dominated by hi/d; for i = 1,...,n, then (7.3) already implies (7.4). 
Therefore, it is left to show that d can be chosen such that h = |h| satisfies 
the domination condition above. 

For this to hold it suffices 

ses EEE, 5 oh 


dy a di; : 
where h solves Ah = 1. The above is implied by 


h+1+6i+Fh—% ‘ (+ Seer to. 


pal : 
— ee 
dy, — d; : q 


because y > 0, 6 < 1/2, where 


B; = e (87 11),. 


This can be written as 


5 i-1 
pe Se! ag ee a ae ae ae i=2,...,n. 
dy 6 ; 


dy dy 
In particular, if we have 


ee 


we 7.8 
dy d;’ ) »n ( ) 


then the above inequality holds true if 
ee 
~ 6d, 6d, 


that is, because ¢ < 1/2, for d; > 0 fori =1,...,n. 
Finally, observe that if dj > d;,i > 2, and d, = O(2"), then the magnitude 


of h is primarily determined by a: recalling (7.5), (7.7), we write 
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oOo 


poifil (-%)s> 
ey | —— a! a 
ad dy : J dj +1 
1 
0 0 dj +1 
Boi fl Bo} 1 0 
=a ——_— + — Ar + = Pr 
dy : dy / d,+1]: dy : 
1 1 0 


Because d; > 0 for i = 1,...,n, we have (d; +1) /2d; > 4 and se ba 
(G1/d,) < ia, implying 


0 0 dj+1 
~ Bofl Bu 1 1 0 
h<a “a s 21d, +1) oP gn—1 
1 1 0 
(7.9) 
0 
~) = ; for large n. 
dy : 
1 


7.4 Uniform Distances to the Redundant Hyperplanes 


Clearly, many different choices for d are possible. In this section we explore 
the uniform model for d; that is dy = dg =--- = dy. 


7.4.1 Bounding a 
For a > 0 we need 


Denoting 


the above can be written as 


(i thét+ (d+ YE +--+ (dh tvyern<d, 
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or, equivalently, 
1 
Se ie ee a 


same as 


In other words, we want 
p(é) = 1 — € — 2¢7 + 2¢"*! > 0. 
Note that as d, > 00, € > § and p(€) > 2€"*1 > 0, so p(€) > 0 for dy large 


enough. 
Observe that € > $ for any d; > 0, and 


p(s) = 34°49 <0 for n = 1,2,... 
= forn=1 
p" (5) = br ae >0 for n = 2,3,4 
<0 otherwise 
4 n ! 
oe (3) a ak eer >0 for k > 3, 


so 


2 
(pe) 20654) 06) -#(@) 00" ()¥ 


Thus, to guarantee p(€) > 0, it is enough to require 


p($+48) 20 


letting € = 5 + AE. 


Denoting 
zs n(n + 1) 
a i= —24 Qn-1 
~ 2(n + 1) 
b:= —-34 ari 
ek 
C= Fa 
and 
sian Ales 2<n<4 
Age = (7.10) 


», n>A 


—b—V 62-44% 
a 


2 
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(the smallest positive root of p(5 + A€) = 0), we conclude that 8, < dj) as 
long as 


a+l 


= =+A 
é 2d, <5 s+ . 
That is, 
1 1 
=> : : 
di 2AE ~ 2AE ae 
Note that for dj = dy =--- = dy, (7.8) holds, so (7.4) is readily implied 
by (7.3). 


It is left to demonstrate how to choose d; satisfying the above to guarantee 
a moderate growth in h as n > ov. 


7.4.2 Picking a “Good” d, 
Note that as n — oo, we have 

Ag* > —— 

o> 3p 


by expanding the square root in (7.10) as a first-order Taylor series. Also, 


- 7 =n(g) 2H) 
and hence 
1 il 
12 = 56 


1 
In fact, for large n, p(€) ¥ p(€), for $ < € < 4 + AE*. In turn, p(€) is almost 
linear on this interval because, as n — 00 


ba -38 Ae 
(we compare the slope of p(4 + Ag) at A€ = 0 with its decrement as a 
function of Ag over [0, Aé*]). 

Recalling that the growth of h is primarily determined by a for large n 
(see (7.9)), our goal becomes to minimize a. From (7.7), (7.11), noting that 
(1 © d; for large n (also recall 3; < di), we get 


~ BE) P(E) 2A &-cB 
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and, moreover the right-hand side of this expression approximates a fairly 
well. So, to approximately minimize h, we maximize 


A 
ag: (e755) 


for 0 < AE < A€E*, which corresponds to setting 


As 
A = 
é 2 
thus resulting in 
a—>6-27" as n—- oo 


and 


7.4.8 An Explicit Formula 


For a given n,¢,6, compute A€é* according to (7.10), set AE = A€é*/2, dy as 
in (7.11). Compute the solution to Ah = 1 using (7.5), where a is computed 
according to (7.6). Set y = (5/3) max; }7%_, |aij| and, finally, 


h= ja +5 


From (7.9) it follows that (for large n) 


0 
1 
co ~ a | a _ 
ge =a 4=1),. >. 
1 


We are interested in picking ¢,6, to minimize the total number of the redun- 
dant constraints, )7j"_, hj. Recalling e + 6 < 4, denoting 
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(it is natural to pick 6 as close to 4 — eas possible, say 6 = .999 (5 — €)). 
In order to bound 


g := min g(e) 
O0<e<4 


we first bound the minimizer «* of the above, next we bound the derivative 
of g(€), and finally using the mean-value theorem we bound g” itself. 
Observing 


(1=3¢e 2827 (ne ® = 1) = (4e — 3)" = 1) 
(1 — 2e)(1 —-n—e") +€(3 —4e)(e7-1 + "7 +--- 41) 
e™(1—e)(1— 2e)? 


3 — 4e Be er oa. Te, Ne 4 
ee —_ en —E E wee 
en(1—e\—2ee \ 2 WG =sde) " \2 


g(e)= 


and noting that in the expression above the second and the third summands 
are monotone-increasing functions of ¢ (for « € (0,1/2),n > 1), for n > 3 we 
can write e* € (e%,e”) with 


Lt. n—5/4 
2n 


because for n > 2 we have 


(1 —2e)(1 —n —e") +e(3 — 4e)(e™ 1 teh 74. + 1)| 


e=el 


(—4n? + 15n — 25) + (—16n? — 40n + 25) (234) 


= 4n(4n +5) <0, 


where the first summand in brackets has no real roots with respect to n and 
the second summand is negative for n > 2, so g’(e”) < 0 and thus g’(e) < 0 
for 0 <e<e*, and 
y._ nai 
"Qn 


because for n > 3, 
(1 — 2e)(1—n—e") +e(3—4e)(e™ +e" 7 +--- +1) 


n—1—n?—2n+2(%1)" 


n(n +1) 
_ na=1~((n=1)(n +3) (BP)" + ()") 
— n(n +1) 
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so g'(eV) > 0 and thus g/(e) > 0 for eY < € < 1/2. Consequently, for 
e € (ce, c”) we have 


g(e)= 


—n(1 — 2¢) _ —2n3 8n \" 
4n — 5 


scene] e"1—ell—2e2 m+ 


and, therefore, by the mean-value theorem, 


ge") + min g/(e)(e” —e”) < g* < gle”); 


e€[el ,cY] a a 


that is, 


eee 4n—5 8n \” l n? 8n \" 
5 \4n+5 4n — 5 A(n +1) \4n—5 
r An (4n—5 8n \" 
as 1 
5 (42) (<5) ) 


for n > 3. 
In turn, this results in )7j_, hy; = O(n2°”), and, in fact, )7"_, hi has the 
same order lower bound as well by the above, noting that 


8n — n 5 : 5/4 n 
(5) —e (+555) a 


as n — oo, for a suitably chosen small ¢ > 0. 


7.5 Geometrically Decaying Distances to the Redundant 
Hyperplanes 


Hat. & 
n tt i=l 


Next we explore the geometric model for d: d; = w (1/é) poragt 
7.5.1 Bounding a 
As in Section 7.4.1, we need to guarantee 3, < dj. 
Firstly, we give a lower bound on (Ax) x=1,....n recursively defined by 
deiiti 
ho Aignce He peeing td (7.12) 
2dK41 


with Ag = 1, and where 
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with some constant w. 


We have d; = dy_;41 fori =1,...,n, and 
fen a. ag 1 Bi : 
—=--—= 1— A,_1), —=1-Ay,-i41, = 2 sax: 
ai a ( 1) ad +1 a Tr 


Note that to satisfy 8, < d, we necessarily must have 1 — A, < 1 for k = 
1,...,n—1. 
From (7.12) it follows that 


Ar ektl Ar ek+l . Ar ek+l 


k+1 9 * 2 w — 2 w ne 
and hence 
aA ~2 
An > sey — pea (1 + (22) + 28)? +2 2%), =D. 


> OR-1 — Qk? 


Observing A; = 3 (1 — (/w)) we can write the above inequality as 


Ss ~ k-2 
il E E . 

A —j|{1--—]|]-—; 2é)" 
Y= OF ( =) wok? d! Z 


IV 


1 ge 42e% 
at : — »_ (28) , k=2,...,n. 


i=0 
Now, for a to be positive, that is, for 


dial i Ag. 
eas (= A,a)=1-(1- Anat > =) 0, 


nr 


it suffices 


which is implied by 


Ifé= s, the above translates into 


1 n—-1 


Qn-1 — Ws Qn-1 


250 
that is 


w>n-l 


resulting in 


=u si(n— 12", 


It is left to verify that hje’+/d; is i 
i = 1,...,n, to ensure (7.4) as in Section 
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a eer 2)F 


ndeed dominated by hy/d, for 
7.3.2. In particular, we demon- 


strate (7.8), as in the case of uniform d. Recalling 


fon dn, + 1 
een _ 
dy dn ( 
and 
a a | An-i41 for 
it immediately follows that 
Bi . Ba 
dy ~ dy’ 
Also observe 
Bi _ Bo — Bs 
ie ae 


because, recalling (7.12) and 0 < Ay, <1,k =1,...,n— 


1 


2 


2— Ay 


1 
(1 — Agsi) — (1 — Ag) = (2 Ap + = ) 1+ Ay 
2 k+l 
A; — “A; 
at sg 
2 Qdi41 


7.5.2 Picking a “Good” w 


As in Section 7.4.2, we would like to minimiz 
the case of € = s, can be well approximated 


(2d, ~ 1) (4-1 = - ) 


We look for 


x 2d, 


( 


e a with respect to w, which, in 
from above by 
) —1 


1 


n—-1 
Qn-1 = 


Ws gn—-1 
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rn ae! it—1 
m : Fa pr ia es ia : 
conan id Qr-l gy. Qn-l , 
that is, 
; w? 
min ——— 
w>n-l w-nt+l1l 
or equivalently 
min (2lnw —In(w—n+1)). 
w>n— 
Setting the gradient to 0, we obtain 
2 1 2(w—n+1)-—w 
= — Ch 0 
w w—-nt+l w(w—n+1) 
which gives us the minimizer 
w = 2(n—1) 
with the corresponding value of a © (n — 1)2?”*!. This results in 
vei, c—) eee) 
and . 
~ n2-" 
h, =O | ——— ], b=1,...,n. 
(a5) . 
7.5.38 An Explicit Formula 
For a given n,¢,6, set dj = (n — 1)2”-**? for i =1,...,n and compute the 


solution to Ah = 1 using (7.5). Set 


5 “ 3~ 
j=l 
From (7.9) it follows that for large n 

0 
1 i-1 

er 1 

hi» —B! — = 

dy : <a(5) ae 

1 


We choose ¢,6, to minimize the total number of the redundant constraints, 


yr, hi. Recalling « + 6 < 1/2, for large n we can write 
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= cal 
1-—2e 1 36 (7.13) 
1 n 
= sh 92nt2 9 (s=) a 
a=) * "(6 — 1 
VT a= 
a _ Qn+2 ee) 
< 3(n—1)2 (0s 1 


In fact, we would expect € to be close to 1/2 in order for 5>;"_, hj to be 
minimized, so the last inequality also gives us a good approximation, namely 


Gri (7.14) 


Indeed, denoting 


and introducing 


Oe CnTF 


we can write the last two lines in (7.13) as 
3(m — 1)2°"¥2¢ F(C) < 3(n — 1)29"*? FC). 
Differentiating ¢f(¢) we get 


(n+1-—¢")(€+1)-2n ; n—-1 
——@a-or <0 ar 


because (n + 1—¢")(€ + 1) — 2n < (n+ 1)(€ +1) — 2n < 0 for such ¢, and 
therefore, the function is decreasing on this interval. So to maximize 


i hj 
3(n — 1)22r+2 


(CFO) = 


CFG) © 


we must take ¢ > (n— 1) /(n +1), thus justifying (7.14). 
Next we demonstrate how to minimize ye h; with respect to0 <ée< 
1/2. We note that the approximate minimum of 5*""_, h; corresponds to 
*:= min : 
ae ee (6) 


To analyze f* we proceed as follows: first we demonstrate that for n large 
enough f(¢) is convex on (0,1); then we produce lower and upper bounds 
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on the root of f’(¢) = 0 in this interval, thus bounding the minimizer; and 
finally we compute upper and lower bounds on f* using Taylor expansions 


of f(¢). 
Observe (n-2)¢-" pong 
ipax WR —n — 
and 
neo _ (n? + 6n +6)? — (2n? + 6n)¢ +n? +n —- 6¢"*? 
Lis eee Toye 
Denoting 
a:=n?+6n+6 
b := —(2n? + 6n) 
é:=n?t+n 


for f”(¢) to be positive on (0,1) it suffices to show that the minimum of 
ac? + b¢ +4 
is greater or equal than 6 (recall 0 < ¢ < 1, so that 6¢"*? < 6). In turn, 


_ 


min (a¢? + ¢ +2) = (ac? + ¢ +2) ay Ee 


so for f”(¢) > 0 on (0,1) it is enough to have 


(2n? + 6n)? é 
ae > 6. 
ieee) 


The above can be rewritten as 


4n® + 12n? + 24n 3n? 


a SG 
An2+6n+6)  n2+6n+6~— 


and is clearly implied by 
n—3>6. 


Therefore, f(¢) is convex on (0,1) as long as n > 9. 
Furthermore, at 
n—-1 


Gg 
¢ “nm +1 


we have 
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On the other hand, at 


we have 


and relying on all the derivatives of (1+ z)” for n € N being positive at z = 0, 


expanding 
n nm 1 n 
= 1 —_ 
(. _ :) ( F n— :) 


into second-order Taylor series, we get 


n n(n—1 n— 
(1+ iF of 3) (2 2) a n? —4n $0 
( a(n 1) (1- B5h)* © 


for n > 4. 
Now we are in position to give bounds on f* for n > 9: by convexity it 
follows that 
FO yer Ce =O ars es: 


that is, 


4 4n 


n+1\" (n+ 1)? 
Se ae 1 : 
srs(5) 
Consequently $7)", h; = O(n?2?") with the same order lower bound in the 
best case for e © (n— 1) /(2(n+1)). 


mtd) 7 ee. ee) 2 ae 
(ia) ( ) 


Remark 7.6. Similar analysis can be easily carried out for dy = w (1/é)* for 
k=1,...,n, with € = ¢. Because 
dn +1 
= ath (1 ~ An-1) 


Dd 
we 


ew — (e+ (22)? Dej (Ze)! + (2e)"1e) 
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and is approximately minimized at w = 2 (c + (2e)? +S ey + (2e)"1e), 
resulting in a ~ 4(2/e)” (c Adee Te (eye Cars); we have 


n n 7 _ 
ie ; h = Z 2 2 a n-1 
2 * oe (2) (+09 2 (6) + (2e) .) i 


_ 2n 1 n 2 I (Jey 
= 12n2 26) ey" (2: + (2e)” + 2(2e) ae) : 


Noting that 


—_— (2¢ + 26)” + 2(2e) 


9 1— (2e)"~? 2(n — 1) 
(1 — 22) (26) ) < 


1-2 ~ (1 = 2e)(2e)” 


for « € (0,1/2) and thus the latter may be bounded from above by the value 
of the right-hand side at ¢ = n/(2(n+ 1)), that is, 2e(n + 1)(n — 1), we get 


Shi < 24e(n + 1)(n — 1)n2?" = O(n32?"), 


i=l 


The resulting estimate for the number of the redundant constraints is not 
much different from the case of € = 1/2, so this model for d is not discussed 
here in any more details. 


7.6 Conclusions 


We provide sufficient conditions for the central path to intersect small neigh- 
borhoods of all the vertices of the Klee—Minty n-cube; see Propositions 7.1, 
7.2, and Corollary 7.1. More precisely, we derive explicit formulae for the 
number of redundant constraints for the Klee—Minty n-cube example given 
in [4, 5]. We give a smaller number of redundant constraints of order n2°” 
when the distances to those are chosen uniformly, as opposed to the previously 
established O(n?2°"). When these distances are chosen to decay geometri- 
cally, we give a slightly tighter bound of the same order n32?” as in [5], that 
results in a provably smaller number of constraints in practice. 

We argue that in R? the sufficient conditions presented are tight and in- 
dicate that the same is likely true in higher dimensions. 

Our construction potentially gives rise to linear programming instances 
that exhibit the worst case iteration-complexity for path-following interior- 
point methods, which almost matches its theoretical counterpart. 

Considering the n-dimensional simplex, Megiddo and Shub [8] demon- 
strated that the total curvature of the central path can be as large as order 
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n. Combined with Corollary 7.2, it follows that the worst-case lower bound 
on the total curvature of the central path is at least exponential in n up to 
a rational multiplier. We conjecture that the total curvature of the central 
path is O(m); see [3]. 
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Chapter 8 


Canonical Duality Theory: 
Connections between Nonconvex 
Mechanics and Global Optimization 


David Y. Gao and Hanif D. Sherali 


Dedicated to Professor Gilbert Strang on the occasion of his 70th birthday 


Summary. This chapter presents a comprehensive review and some new 
developments on canonical duality theory for nonconvex systems. Based on 
a tricanonical form for quadratic minimization problems, an insightful re- 
lation between canonical dual transformations and nonlinear (or extended) 
Lagrange multiplier methods is presented. Connections between complemen- 
tary variational principles in nonconvex mechanics and Lagrange duality in 
global optimization are also revealed within the framework of the canonical 
duality theory. Based on this framework, traditional saddle Lagrange duality 
and the so-called biduality theory, discovered in convex Hamiltonian systems 
and d.c. programming, are presented in a unified way; together, they serve 
as a foundation for the triality theory in nonconvex systems. Applications 
are illustrated by a class of nonconvex problems in continuum mechanics and 
global optimization. It is shown that by the use of the canonical dual trans- 
formation, these nonconvex constrained primal problems can be converted 
into certain simple canonical dual problems, which can be solved to obtain 
all extremal points. Optimality conditions (both local and global) for these 
extrema can be identified by the triality theory. Some new results on gen- 
eral nonconvex programming with nonlinear constraints are also presented 
as applications of this canonical duality theory. This review brings some fun- 
damentally new insights into nonconvex mechanics, global optimization, and 
computational science. 
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8.1 Introduction 


Complementarity and duality are two inspiring, closely related concepts. To- 
gether they play fundamental roles in multidisciplinary fields of mathematical 
science, especially in engineering mechanics and optimization. 

The study of complementarity and duality in mathematics and mechanics 
has had a long history since the well-known Legendre transformation was 
formally introduced in 1787. This elegant transformation plays a key role in 
complementary duality theory. In classical mechanical systems, each energy 
function defined in a configuration space is linked via the Legendre trans- 
formation with a complementary energy in the dual (source) space, through 
which the Lagrangian and Hamiltonian can be formulated. In static systems, 
the convex total potential energy leads to a saddle Lagrangian through which 
a beautiful saddle min-max duality theory can be constructed. This saddle 
Lagrangian plays a central role in classical duality theory in convex analy- 
sis and constrained optimization. In convex dynamic systems, however, the 
total action is usually a nonconvex d.c. function, that is, the difference of 
convex kinetic energy and total potential functions. In this case, the classical 
Lagrangian is no longer a saddle function, but the Hamiltonian is convex in 
each of its variables. It turns out that instead of the Lagrangian, the Hamilto- 
nian has been extensively used in convex dynamics. From a geometrical point 
of view, Lagrangian and Hamiltonian structures in convex systems and d.c. 
programming display an appealing symmetry, which was widely studied by 
their founders. Unfortunately, such a symmetry in nonconvex systems breaks 
down. It turns out that in recent times, tremendous effort and attention have 
been focused on the role of symmetry and symmetry-breaking in Hamilto- 
nian mechanics in order to gain a deeper understanding into nonlinear and 
nonconvex phenomena (see Marsden and Ratiu, 1995). 

The earliest examples of the Lagrangian duality in engineering mechanics 
are probably the complementary energy principles proposed by Haar and von 
Karman in 1909 for elastoperfectly plasticity and Hellinger in 1914 for contin- 
uum mechanics. Since the boundary conditions in Hellinger’s principle were 
clarified by E. Reissner in 1953 (see Reissner, 1996), the complementary— 
dual variational principles and methods have been studied extensively for 
more than 50 years by applied mathematicians and engineers (see Arthurs, 
1980, Noble and Sewell, 1972). The development of mathematical duality 
theory in convex variational analysis and optimization has had a similar his- 
tory since W. Fenchel proposed the well-known Fenchel transformation in 
1949. After the revolutionary concepts of superpotential and subdifferentials 
introduced by J. J. Moreau in 1966 in the study of frictional mechanics, 


1 Eric Reissner (PhD 1938) was a professor in the Department of Mathematics at MIT 
from 1949 to 1969. According to Gil Strang, since Reissner moved to the Department of 
Mechanical and Aerospace Engineering at University of California, San Diego in 1969, many 
applied mathematicians in the field of continuum mechanics, especially solid mechanics, 
switched from mathematical departments to engineering schools in the United States. 
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the modern mathematical theory of duality has been well developed by cele- 
brated mathematicians such as R. T. Rockafellar (1967, 1970, 1974), Moreau 
(1968), Ekeland (1977, 2003), I. Ekeland and R. Temam (1976), F. H. Clarke 
(1983, 1985), Auchmuty (1986, 2001), G. Strang (1979-1986), and Moreau, 
Panagiotopoulos, and Strang (1988). Mathematically speaking, in linear elas- 
ticity where the total potential energy is convex, the Hellinger-Reissner com- 
plementary variational principle in engineering mechanics is equivalent to 
a Fenchel—Moreau-Rockafellar type dual variational problem. The so-called 
generalized complementary variational principle is actually the saddle La- 
grangian duality theory, which serves as the foundation for hybrid/mixed 
finite element methods, and has been subjected to extensive study during 
the past 40 years (see Strang and Fix (1973), Oden and Lee (1977), Pian 
and Tong (1980), Pian and Wu (2006), Han (2005), and the references cited 
therein). 

Early in the beginning of the last century, Haar and von Karman (1909) 
had already realized that in nonlinear variational problems of continuum me- 
chanics, the direct approaches for solving minimum potential energy (primal 
problem) can only provide upper bounding solutions. However, the minimum 
complementary energy principle (i.e., the maximum Lagrangian dual prob- 
lem) provides a lower bound (the mathematical proof of Haar-von Karman’s 
principle was given by Greenberg in 1949). In safety analysis of engineering 
structures, the upper and lower bounding approximations to the so-called col- 
lapse states of the elastoplastic structures are equally important to engineers. 
Therefore, the primal—dual variational methods have been studied extensively 
by engineers for solving nonsmooth nonlinear problems (see Gao, 1991, 1992, 
Maier, 1969, 1970, Temam and Strang, 1980, Casciaro and Cascini, 1982, 
Gao, 1986, Gao and Hwang, 1988, Gao and Cheung, 1989, Gao and Strang, 
1989b, Gao and Wierzbicki, 1989, Gao and Onate, 1990, Tabarrok and Rim- 
rott, 1994). The article by Maier et al. (2000) serves as an excellent survey on 
the developments for applications of the Lagrangian duality in engineering 
structural mechanics. In mathematical programming and computational sci- 
ence, the so-called primal—dual interior point methods are also based on the 
Lagrangian duality theory, which has emerged as a revolutionary technique 
during the last 15 years. Complementary to the interior-point methods, the 
so-called pan-penalty finite element programming developed by Gao in 1988 
(1988a,b) is indeed a primal-dual exterior-point method. He proved that in 
rigid-perfectly plastic limit analysis, the exterior penalty functional and the 
associated perturbation method possess an elegant physical meaning, which 
led to an efficient dimension rescaling technique in large-scale nonlinear mixed 
finite element programming problems (Gao, 1988b). 

In mathematical programming and analysis, the subject of complementar- 
ity is closely related to constrained optimization, variational inequality, and 
fixed point theory. Through the classical Lagrangian duality, the KKT condi- 
tions of constrained optimization problems lead to corresponding complemen- 
tarity problems. The primal—dual schema has continued to evolve for linear 
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and convex mathematical programming during the past 20 years (see Walk, 
1989, Wright, 1998). However, for nonconvex systems, it is well known that 
the KKT conditions are only necessary under certain regularity conditions 
for global optimality. Moreover, the underlying nonlinear complementarity 
problems are fundamentally difficult due to the nonmonotonicity of the non- 
linear operators, and also, many problems in global optimization are NP-hard. 
The well-developed Fenchel-Moreau-—Rockafellar duality theory will produce 
a so-called duality gap between the primal problem and its Lagrangian dual. 
Therefore, how to formulate perfect dual problems (with a zero duality gap) is 
a challenging task in global optimization and nonconvex analysis. Extensions 
of the classical Lagrangian duality and the primal—dual schema to nonconvex 
systems are ongoing research endeavors (see Aubin and Ekeland, 1976, Eke- 
land, 1977, Thach, 1993, 1995, Thach, Konno, and Yokota, 1996, Singer, 1998, 
Gasimov, 2002). On the flip side, the Hellinger—-Reissner complementary en- 
ergy principle, emanating from large deformation mechanics, holds for both 
convex and nonconvex problems. It is very interesting to note that around 
the same time period of Reissner’s work, the generalized potential variational 
principle in finite deformation elastoplasticity was proposed independently by 
Hu Hai-chang (1955) and K. Washizu (1955). These two variational principles 
are perfectly dual to each other (i-e., with zero duality gap) and play impor- 
tant roles in large deformation mechanics and computational methods. The 
inner relations between the Hellinger—-Reissner and Hu—Washizu principles 
were discovered by Wei-Zang Chien in 1964 when he proposed a systematic 
method to construct generalized variational principles in solid mechanics (see 
Chien, 1980). 

Mechanics and mathematics have been complementary partners since 
Newton’s time, and the history of science shows much evidence of the bene- 
ficial influence of these disciplines on each other. However, the independent 
developments of complementary—duality theory in mathematics and mechan- 
ics for more than a half century have generated a “duality gap” between the 
two partners. In modern analysis, the mathematical theory of duality was 
mainly based on the Fenchel transformation. During the last three decades, 
many modified versions of the Fenchel-Moreau—Rockafellar duality have been 
proposed. One, the so-called relaxation method in nonconvex mechanics, can 
be used to solve the relaxed convex problems (see Atai and Steigmann, 1998, 
Dacorogna, 1989, Ye, 1992). However, due to the duality gap, these relaxed 
solutions do not directly yield real solutions to the nonconvex primal prob- 
lems. Thus, tremendous efforts have been focused recently on finding the 
so-called perfect duality theory in global optimization. On the other hand, it 
seems that most engineers and scientists prefer the classical Legendre trans- 
formation. It turns out that their attention has been mainly focused on how 
to use traditional Lagrange multiplier methods and complementary consti- 
tutive laws to correctly formulate complementary variational principles for 
numerical computational and application purposes. Although the generalized 
Hellinger—Reissner principle leads to a perfect duality between the noncon- 
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vex potential variational problem and its complementary—dual, and has many 
important consequences in large deformation theory and computational me- 
chanics, the extremality property of this well-known principle, as well as the 
Hu-Washizu principle, remained an open problem for more than 40 years, 
and this raised many arguments in large deformation theory and nonconvex 
mechanics (see Levinson, 1965, Veubeke, 1972, Koiter, 1976, Ogden, 1975, 
1977, Lee and Shield, 1980a,b, Guo, 1980). 

Actually, this open problem was partially solved in 1989 in the joint work 
of Gao and Strang (1989a) on nonconvex/nonsmooth variational problems. 
In order to recover the lost symmetry between the nonconvex primal problem 
and its dual, they introduced a so-called complementary gap function, which 
leads to a nonlinear Lagrangian duality theory in fully nonlinear variational 
problems. They proved that if this gap function is positive on a dual feasi- 
ble space, the generalized Hellinger-Reissner energy is a saddle-Lagrangian. 
Therefore, this gap function provides a sufficient condition in nonconvex vari- 
ational problems. However, the extremality conditions for negative gap func- 
tion were ignored until 1997 when Gao (1997) got involved with a project on 
postbuckling problems in nonconvex mechanics. He discovered that if this gap 
function is negative, the generalized Hellinger—Reissner energy (the so-called 
super-Lagrangian) is concave in each of its variables, which led to a biduality 
theory. Therefore, a canonical duality theory has gradually developed, first in 
nonconvex mechanics, and then in global optimization (see Gao, 1990-2005). 
This new theory is composed mainly of a potentially useful canonical dual 
transformation and an associated triality theory, whose components comprise 
a saddle min-max duality and two pairs of double-min, double-max dualities. 
The canonical dual transformation can be used to formulate perfect dual 
problems without a duality gap, whereas the triality theory can be used to 
identify both global and local extrema. 

The goal of this chapter is to present a comprehensive review on the canon- 
ical duality theory within a unified framework, and to expose its role in estab- 
lishing connections between nonconvex mechanics and global optimization. 
Applications to constrained nonconvex optimization problems are shown to 
reveal some important new results that are fundamental to global optimiza- 
tion theory. This chapter should be of interest to both the operations research 
and applied mathematics communities. In order to make this presentation 
easy to follow by interdisciplinary readers, our attention here is mainly fo- 
cused on smooth systems, although some concepts from nonsmooth analysis 
have been used in later sections. 


8.2 Quadratic Minimization Problems 


Let us begin with the simplest quadratic minimization problem (in short, the 
primal problem (P,)): 
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(Pg): min {Ptw) = —(u, Au) —(u,f) : we us) ; (8.1) 


1 
2 
where U4, is an open subset of a linear space U/; A is a linear symmetrical 
operator, which maps each u € UY into its dual space U*; the bilinear form 
(u,u*) :U x U* > R puts U and U’* in duality; f € U* is a given input, and 
P:U—R represents the total cost (action) of the system. The criticality 
condition 6P(w) = 0 leads to a linear equation 


Au=f, (8.2) 


which is called the fundamental equation (or equilibrium equation) in math- 
ematical physics. By the fact that A :U — U* is a symmetrical operator, we 
have the following canonical decomposition, 


A=A"DA, (8.3) 


where A: U — Y is a so-called geometrical operator, which maps each u € U 
into a so-called intermediate space Y, and the symmetrical operator D links 
VY with its dual space V*. The bilinear form (uv ;v*) : V x V* — R puts 
Y and Y* in duality. We distinguish between the notations ( , ) and (; ) 
according to the differences of the dual spaces U/ x U* and V x V* on which 
they are respectively defined. The mapping v* = Dv € Y* is called the duality 
equation. The adjoint operator A* : ¥* — U*, defined by 


(Au ;v*) = (u, A*v*), 


is also called the balance operator. Thus, by the use of the intermediate pair 
(v,v*), the fundamental equation (8.2) can be split into the so-called tri- 
canonical form 


(a) geometrical equation: Au = v 
(b) duality equation: Dv=v" »>=> ADAu=f. (8.4) 
(c) balance equation: A*yu* = f 


In mathematical physics, the duality equation v* = Dv is also recognized as 
the constitutive law and the operator D depends on the physical properties 
of the system considered. 

The pair (v, v*) is said to be a canonical dual pair on Vax Vz C Vx Y* if the 
duality mapping D: V, C V — Ve C Y* is one-to-one and onto. Generally 
speaking, most physical variables appear in dual pairs; that is, there exists 
a Gateaux differentiable function V : ¥, — R such that the duality relation 
v* = 6V(v) : Va — V% is revertible, where dV(v) represents the Gateaux 
derivative of V at v. In mathematical physics, such a function is called free 
energy. Its Legendre conjugate V*(v*) : V* — R, defined by the Legendre 
transformation 
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V*(u") = sta{(v;u*) —V(v) : ve Va}, (8.5) 


is called complementary energy, where sta{ } denotes finding stationary 
points of the statement in { }. In order to study the canonical duality theory, 
consider the following definition. 


Definition 8.1. A real-valued function V : Vg C Y — R is called a canonical 
function on V, if its Legendre conjugate V*(v*) can be uniquely defined on 
vs Cc V* such that the following relations hold on V, x V7: 


v =dV(v) & v=dV*(U") & (vu; 0°) =V(v) + V* (0%). (8.6) 


Clearly, if D : Va — YV% is invertible, the quadratic function V(v) = 
3 (v; Dv) is canonical on V, and its Legendre conjugate V*(v*) = $(D~1v*;v*) 
is a canonical function on V*. Generally speaking, if V : V, — R is a canonical 
function and v* = dV (v), then (v, v*) is a canonical dual pair on V, x V*. The 
one-to-one canonical duality relation serves as a foundation for the canonical 
dual transformation method reviewed in the following sections. The defini- 
tion of the canonical pairs and functions can be generalized to nonsmooth 
systems where the Fenchel transformation and subdifferential have to be ap- 
plied (see Gao, 2000a,c). This is discussed in the context of constrained global 
optimization problems in Section 8.8 of this chapter. 

In order to study general problems, we denote the linear function (uw, f) 
by U(u). If the feasible space U/, can be written in the form of 


U;, ={uEu,| Au € Va}, (8.7) 
then the problem (P,) can be written in a general form 
(P): min{P(u) =V(Au) -—U(u) : wEeug}. (8.8) 


This general form covers many problems in applications. In continuum 
mechanics, the feasible set U;, is usually called the kinetically admissible space. 
In statics, where the function V(v) is viewed as an internal (or stored) energy 
and U(wu) is considered as an external energy, the cost function P(w) is the 
so-called total potential and (P) represents a minimal potential variational 
problem. In dynamical systems if V(v) is considered as a kinetic energy and 
U(u) is the total potential, then P(u) is called the total action of the system. 
In this case, the variational problem associated with the general form (P) is 
the well-known least action principle. A diagrammatic representation of this 
tricanonical decomposition is shown in Figure 8.1. 

The development of the A*DA-operator theory was apparently initiated 
by von Neumann in 1932, and was subsequently extended and put into a more 
general setting in the studies of complementary variational principles in con- 
tinuum mechanics by Rall (1969), Arthurs (1980), Tonti (1972a,b), Oden and 
Reddy (1983), and Sewell (1987). In mathematical analysis, the tricanonical 
form of A = A*DA has also been used to develop a mathematical theory 
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u€Ua CU =— (u, u*) —* U* DUZ 5 u* 
A A* 
veEVac VY «~— (v; v*) — + y* DVidv* 


Fig. 8.1 Diagrammatic representation for quadratic systems. 


of duality by Rockafellar (1970), Ekeland and Temam (1976), Toland (1978, 
1979), Auchmuty (1983), Clarke (1985), and many others. In the excellent 
textbook by Strang (1986), the trifactorization A = A*DA for linear oper- 
ators can be seen through an application of continuum theories to discrete 
systems. In what follows, we list some simple examples. More applications 
can be found in the monograph Gao (2000a). 


8.2.1 Quadratic Optimization Problems in R” 


First, we consider U as a finite-dimensional space such that U = U* = R”. 
Thus A: U — U* is a symmetric matrix in R”*” and the bilinear form 
(u,u*) = ul u* is simply a dot-product in R”. By linear algebra, the canonical 
decomposition A = A* DA can be performed in many ways (see Strang, 1986), 
where A: R" — R™ is a matrix, D: R™ — R"™ is a symmetrical matrix, and 
A* = AT maps V* = R™ back to U* = R”. The bilinear forms (* , *) and 
(* ; *) are simply dot products in R” and R™, respectively, that is, 


(Au; vu >> rose = ; (« 5 sui] = (u, AT y*). 
j=1 


w=1 j=l j 


If the matrix A is positive semidefinite, we can always choose a geometrical 
operator A to ensure that the matrix D € R™%*"™ is positive definite. In this 
case the problem (P) is a convex program and any solution of the fundamental 
equation Au = f also solves the minimization problem (P). 

If the matrix A is indefinite, the quadratic function $(u, Au) is noncon- 
vex. From linear algebra, it follows then that by choosing a particular linear 
operator A: R” — R™, the matrix A can be written in the tricanonical form: 


A= (AT, 1) & a Gi (8.9) 
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where D € R™*"™ is positive definite, C € R"*” is positive semidefinite, 
and I is an identity in R”. In this case, both V(v) = $(v; Dv) and U(u) = 
$(u, Cu) + (u, f) are convex quadratic functions, but 


P(u) = V(Au) — U(u) = 5 (du; DAw) — 5 (x, Ou) — (u, f) 


is a nonconvex d.c. function, that is, a difference of convex functions. In this 
case, the problem (P) is a nonconvex quadratic minimization and the solution 
of Au = f is only a critical point of P(u). 

Nonconvex quadratic programming and d.c. programming are important 
from both the mathematical and application viewpoints. Sahni (1974) first 
showed that for a negative definite matrix A, the problem (P) is NP-hard. 
This result was also proved by Vavasis (1990, 1991) and by Pardalos (1991). 
During the last decade, several authors have shown that the general quadratic 
programming problem (P) is an NP-hard problem in global optimization (cf. 
Murty and Kabadi, 1987, Horst et al., 2000). It was shown by Pardalos and 
Vavasis (1991) that even when the matrix A is of rank one with exactly one 
negative eigenvalue, the problem is NP-hard. In order to solve this difficult 
problem, much effort has been devoted during the last decade. Comprehensive 
surveys have been given by Floudas and Visweswaran (1995) for quadratic 
programming, and by Tuy (1995) for d.c. optimization. 


8.2.2 Variational Problems in Continuum Mechanics 


In continuous systems the linear space U is usually a function space over a 
time-space domain, and the linear mapping A is a differential operator. In 
classical Newtonian dynamics, for example, the fundamental equation (8.2) 
is a second-order differential equation 


Au = —mu" = f, 


where f is an applied force field. In this case, A = d/ dt is a linear differential 
operator, m > 0 is a mass density, and A* = —d/dt can be defined by 
integrating by parts over a time domain T C R with boundary OT: 


(Au; v*) = ‘ u'v™ dt =| u(—v*)' dt = (u, A*v*), 
ig - 


subject to the boundary conditions u(t)v*(t) = 0, Vt € OT. 
For Newton’s law, D = m is a constant and the tricanonical form Au = 
A* DAu = —mu" = f is Newton’s equilibrium equation. The quadratic form 


V(Au) = =(u, Au) = 5 (Au; DAu) = 5 [ mu? dt 


1 
2 
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represents the internal (or kinetic) energy of the system, and the linear term 


uu) = f usar 


represents the external energy of the system. The function P(w) = V(Au) — 
U(u) is called the total action, which is a convex functional. 

For Einstein’s law, however, D = m(t) = m,/\/1 — c?/v? depends on the 
velocity v = u’, where m, > 0 is a constant and c is the speed of light. In 
this case, the tricanonical form Au = f leads to Einstein’s theory of special 


relativity: 
d Mo d 
dt ( 1—u2/e i) =f. 


V(v) = | -—moV1—v?/c dt 
T 


is no longer quadratic, but is still a convex functional on V, = {vu € 
L~Y(T) | v(t) < ec, Vt € T}. By using the canonical dual transforma- 
tion, the nonlinear minimization problem (P) can be solved analytically (see 
Gao, 2000b). 

In mass-spring systems, A = —(mO,, + k) and the fundamental equation 
(8.2) has the form: 


The kinetic energy 


Au = —mu" — ku f. 


The additional term ku represents the spring force and k > 0 is a spring 
constant. In this case, if we let A = (Q;,1)" be a vector-valued operator, the 
second-order linear differential operator A can still be written in the A*DA 


form as P A . 
m O meas 
= _ fa) 

A= (mas +k)= | re y kK Al | dl . (8.10) 


As evident here, if we let A = (0;,1)? be a vector-valued operator, the oper- 
ator D is indefinite. However, if we let A = 0;, then similar to (8.9), we have 
D =m, which is positive definite. Thus in this dynamical system, we have. 


V(0) = ff Srv? a, uw) =f (se -u2) dt, 


where the quadratic function U(w) represents the total potential energy. The 
quadratic functional given by 


P= VER- T= I smu dt — I (shu? —ufldt (8.11) 


is the well-known total action, which is again a d.c. functional. 
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Actually, every function P(u) € C? is d.c. on any compact convex set Uy, 
and any d.c. optimization problem can be reduced to the canonical form (see 
Tuy, 1995): 

min{V(Au) : U(u) <0, G(u) > 0}, (8.12) 


where V, U, and G are convex functions. In the next section, we demonstrate 
how the tricanonical A*DA-operator theory serves as a framework for the 
Lagrangian duality theory. 


8.3 Canonical Lagrangian Duality Theory 


Classical Lagrangian duality was originally studied by Lagrange in analytical 
mechanics. In engineering mechanics it has been recognized as the comple- 
mentary variational principle, and has been subjected to extensive study for 
more than several centuries. In this section, we show its connection to con- 
strained optimization/variational problems. In addition to the well-known 
saddle Lagrangian duality theory, a so-called super-Lagrangian duality is pre- 
sented within a unified framework, which leads to a biduality theorem in d.c. 
programming and convex Hamiltonian systems. 
Recall the general primal problem (8.8) 


(P): min{P(u) =V(Au)-—U(u) : we ly}, (8.13) 


where V : ¥, C V > R is a canonical function, U : U, — R is a Gateaux 
differentiable function, either linear or canonical, and U, = {u EU, | Au € 
V.} is a convex feasible set. Without loss of generality, we assume that the 
geometrical operator A : Uz — VY can be chosen in a way such that the 
canonical function V : Y, — R is convex. By the definition of the canonical 
function, the duality relation v* = dV(v) : Va — V= leads to the following 
Fenchel-Young equality on V, x Vi, 


V(v) = (v3; v*) — V*(0"*). 


Substituting this into equation (8.13), the Lagrangian L(u, v*) :Uax Vi > R 
associated with the canonical problem (P) can be defined by 


L(u,v*) = (Au; v*) — V*(u*) — U(u). (8.14) 


Definition 8.2. (Canonical Lagrangian) A function L : Ux Vi — R 
associated with the problem (P) is called a canonical Lagrangian if it is a 
canonical function on Y* and a canonical or linear function on Ug. 


The criticality condition 6L(u, 0*) = 0 leads to the well-known Lagrange 
equations: 
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Aa = 6V*(0*) 


A*o* = 6U(a). Sel9) 


By the fact that V : V, — V% is a canonical function, the Lagrange equations 
(8.15) are equivalent to A*6V (At) = dU (a). If (u,0*) is a critical point of 
L(u,v*), then @ is a critical point of P(u) on Ug. 

Because the canonical function V is assumed to be convex on Vy, the 
canonical Lagrangian L(u,v*) is concave on V*. Thus, the extremality condi- 
tions of the critical point of E(u, v*) depend on the convexity of the function 
U(u). Two important duality theories are associated with the canonical La- 
grangian, as shown in Sections 8.3.1 and 8.3.2 below. 


8.3.1 Saddle-Lagrangian Duality 


First, we assume that U(w) is a concave function on U4. In this case, D(u, v*) 
is a saddle-Lagrangian; that is, D(u,v*) is convex on U, and concave on V*. 
By the traditional definition, a pair (t, 0*) is called a saddle point of L(u,v*) 
on Uy x V3 if 


L(u,0*) > L(a,0*) > L(a,v*), — V(u,v*) €Ua x VE. (8.16) 


The classical saddle-Lagrangian duality theory can be presented precisely 
by the following theorem. 


Theorem 8.1. (Saddle-Min-Max Theorem) Suppose that the function 
U : Ug — R is concave and there exists a linear operator A : Ug > Va 
such that the canonical Lagrangian L: Ua x Vi — R is a saddle function. If 
(ti, 0) € Ug x Vz is a critical point of L(u,v*), then 


i De L(u, v*) = L(@,0*) = Bs a L(u, v*). (8.17) 


By using this theorem, the dual function P4(v*) can be defined as 


Po") = min (u,v*) = U?(A*u*) — V*(0*), (8.18) 


ucla 


where U? : U* — R is a Fenchel conjugate function of U defined by the 
Fenchel transformation 


U?(u*) = min {(u, 1") —U(u)}. (8.19) 


Because U(u) is a concave function on U,, the Fenchel conjugate U? is also a 
concave function on Uj C U*. Thus, on the dual feasible space V; defined by 


Vi ={v* EVe| A*u* EUs, (8.20) 
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the problem, which is dual to (P), can be proposed as the following, 

(P*): max{P4(u*) : v* EVE}. (8.21) 
The saddle min-max duality theory leads to the following well-known result. 


Theorem 8.2. (Saddle-Lagrangian Duality Theorem) Suppose that 
L(u,v*) : Ua x Ve — R is a canonical saddle Lagrangian and (%,0*) is a 


critical point of L(u,v*). Then &% is a global minimizer of P(u), 0* is a global 
maximizer of P4(v*), and 
min P(u) = P(a) = L(u,0*) = P4(o*) = max P%(v*). (8.22) 


ucuy v*EVE 


Particularly, for a given f € U* such that U(u) = (u, f) is a linear function 
on U,, the Fenchel-conjugate U?(u*) can be computed as 


b(n) wat *\ = 0 if u* = di 

a= coat = os otherwise. B29) 
Its effective domain is Ui = {u* € U*| u* = f}. Thus, the dual feasible space 
can be well defined as Vi = {v* € Vi| A*v* = f}, and the dual problem is 
a concave maximization problem with a linear constraint: 


(P?): max{P4(v*) =—V*(u*) : A*tu* =f, v* EVE}. (8.24) 


By using the Lagrange multiplier u € U, to relax the linear constraint, we 
have 
D(u, 0") = —V*(v") + (u, (A*v* — f)), 


which is exactly the canonical Lagrangian (8.14) associated with the problem 
(P) if the Lagrange multiplier u is in U, such that V(Au) is a canonical 
function on Y,. This shows that the classical Lagrangian can be obtained in 
two ways: 


1. Legendre transformation method (by choosing a proper linear op- 
erator A in (P)) 

2. Classical Lagrange multiplier method (by relaxing the constraint 
Aty* = Ff in. (P*)) 


In engineering mechanics, because V* is called the complementary energy, 
the constrained problem 


min{V*(u*) : A*v* =f, v* e Vet 


is also called the complementary variational problem and the Lagrangian 
L(u,v*) is called the generalized complementary energy. In computational 
mechanics, the saddle-Lagrangian duality theory serves as a foundation for 
mixed and hybrid finite element methods. 
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8.3.2 Super-Lagrangian Duality 


If the function U : U, — R is convex, the canonical Lagrangian L(u,v*) is 
concave in each of its variables u € Ug and v* € Vx. However, L(u,v*) may 
not be concave in (u,v*) € Uy x Vx (see examples in Gao, 2000a). In this 
case, consider the following definition that was introduced in Gao (2000a). 


Definition 8.3. A point (u,0*) is said to be a supercritical (or 0* -critical) 
point of L on Ug x V% if 


L(u,v*) < L(a,o*) > L(u,t*), — V(uv*) €Ug x VE. (8.25) 


A function L : U, x V* > R is said to be a supercritical (or 0*) function 
on UU, x Vz if it is concave in each of its arguments; that is, 


L:U,— Ris concave, YWu* €V*, 


L:Vi— Ris concave, Vu € Ug. 


In particular, if the supercritical function L : U, x Vx — R is a Lagrange 
form, it is called a super-Lagrangian. 


From a duality viewpoint, a point (w,t*) is said to be a subcritical (or 
O- -critical) point of L on U, x Vx if 


L(u,v*) > L(t, v*) < L(u,o*), V(u,v") €Ug x Ve. (8.26) 
This definition comes from the subdifferential (see Gao, 2000a): 
wed Viv) = {v* EVE | Viv) — V(8) > (uv —3;0*), Vu © Va}. 


Clearly, (ti, &*) is a supercritical point of L on U, x V* if and only if it is 
a subcritical point of —L on U, x V5. 


Theorem 8.3. (Super-Lagrangian Duality Theorem (Gao, 2000a)) 
Suppose that there exists a linear operator A: Ug — Vq such that L : Ug x 
v* — R is a super-Lagrangian. If (t,0*) € Ug x Vx is a supercritical point of 
L(u,v*) on Ug x Vx, then either the supermaximum theorem in the form 


*) = L(G, 0") = L 2 
max aS L(u,v") (ti, 0”) Boars max (u, v") (8.27) 


holds, or the supermin-max theorem in the form 


min pS Lila,u*) = Lae") = oe max L(u, vu") (8.28) 


holds. 


Based on this super-Lagrangian duality theorem, a dual function to the 
nonconvex d.c. function P(w) = V(Au) — U(u) can be formulated as 
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P4(v*) = max L(u, v*) = U*(A*u*) — V*(v"), (8.29) 


ucla 
where U# : Y* — R is defined by the super-Fenchel transformation 
U*(u*) = max{(u,u*) —U(u) : uel}. (8.30) 


Suppose that U* C U* is an effective domain of U*. Then on the dual feasible 
space Vi = {u* € Vz| A*u* € UZ}, we have the following result. 


Theorem 8.4. (Biduality Theory (Gao, 2000a)) /f (u,0*) is a super- 
critical point of L(u,v*), then either the double-min theorem in the form 


min P(u) = P(a) = L(a,0*) = P4(5*) = min P%(v*) (8.31) 


ucuy, v*EVE 
holds, or the double-maz theorem in the form 


max P(u) = P(t) = L(u,v*) = P4(o*) = max P%(v*) (8.32) 


ucuy v*EVE 
holds. 


The Hamiltonian H : YU, x Vi — R associated with the Lagrangian is 
defined by 


H(u,v*) = (Au; v*) — L(u,v*) = V*(v*) + Uw). 8.33) 
Clearly, if L(u, v*) is a super-Lagrangian, the Hamiltonian H(u,v*) is convex 


in each of its variables and in terms of H(u, v*), the Lagrange equations (8.15) 
can be written in the so-called Hamiltonian canonical form: 


Au = byxH(u,v*),  Ato* = 5H u,v"). 8.34) 


However, this nice symmetrical form and the convexity of the Hamiltonian do 
not afford new insights into understanding the extremality conditions of the 
nonconvex problem. The super-Lagrangian duality theory plays an important 
role in d.c. programming, convex Hamilton systems, and global optimization. 


8.3.8 Applications in Quadratic Programming and 
Commentary 


Now, let us consider the nonconvex quadratic programming problem (P,) 
where the cost function is a d.c. function 


(uu, Cu) — (th f) 


P(u) = (eu DAu) — a 
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as discussed in (8.2.1), where D is a positive definite matrix in R™*™, and 
C € R"*” is positive semidefinite. Because U(u) = $(u,Cu) + (u, f) in this 
case is convex, the Lagrangian 


1 1 
L(u,v*) = (Au;0") — 5(D~tv"; 0") ~ 5(u, Cu) — (u, f) 
is a super-Lagrangian. By using the super-Fenchel transformation, we have 

1 

H = es PN 
U"(u) = max{(u,u" — f) — >(u,Cu)} 
1 * * 
= 5(Ctw -f), (ut - A), 


subject to u* — f € C(C), where C* is a pseudo-inverse of C and C(C) 
represents the column space of C. Thus, on the dual feasible space 


Vi ={u* EV, CR™| ATu* —f ec(c)}, (8.35) 
the dual function 
PA(yt) = (C#(Atot — f), Ato" — f) — 5 (Dv 0" (8.36) 


is also a d.c. function. The biduality theorem shows that the optimal values 
of the primal and dual problems are equal. If & solves the primal (either 
minimization or maximization) and A*t* — f € O~U(%), then &* solves the 
dual. 

One of the earliest and best known double-min duality schemes was for- 
mulated by Toland (1978) for the d.c. minimization problem 


min{W(u) —U(u) : wedom W}, (8.37) 


where W(w) is an arbitrary function, U(w) is a convex proper Isc function on 
IR”, and dom W represents effective domain of W. The dual problem is 


min{U#(u*) —W*(u*) : u* € dom U*}, (8.38) 


which is also a d.c. minimization problem in R”. The generalizations were 
made by Auchmuty (1983) to general nonconvex functionals with a linear op- 
erator A. Since then, several important duality concepts have been developed 
and studied for nonconvex optimization and d.c. programming by Crouzeix 
(1981), Hiriart-Urruty (1985), Singer (1998), Penot and Volle (1990), Tuy 
(1995), Thach (1993, 1995), and many others. A detailed review on duality 
in d.c. programming appears in Tuy (1995). Much of the foregoing discus- 
sion is based on generalized nonconvex functionals, which are allowed to be 
extended-real-valued. In order to avoid difficulties such as oo — oo, a modified 
version of the double-min duality in optimization was presented in Rock- 
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afellar and Wets (1998). It is traditional in the calculus of variations and 
optimization that the primal problem is always taken to be a minimization 
problem. However, this tradition somewhat obscures our view of more gen- 
eral problems. In convex Hamiltonian systems where V(v) = $(Au, DAu) is 
a kinetic energy function and U(u) = $(u,Cu) + (u, f) is a total potential 
energy function, the d.c. function P(u) = V(Au) — U(u) represents a total 
action of the system. As pointed out in Ekeland (1990) and Gao (2000a), in 
the context of convex dynamical systems, the least action principle is some- 
how misleading because the action is a d.c. function that takes minimum and 
maximum values periodically over the time domain. Both the min- and the 
max-primal problems have to be considered simultaneously in a period. The 
biduality theorem reveals a periodic behavior of dynamical systems. 

In two-person game theory, the biduality theory shows that the d.c. pro- 
gramming problem has two Nash equilibrium points. 

The super-Lagrangian duality and the associated biduality theory were 
first proposed in the monograph Gao (2000a). Based on this theory and 
the tricanonical form A* DA, we reformulated the nonconvex quadratic pro- 
gramming problem in a dual form of (8.36), which is well defined on the 
dual feasible space Vi C R™ (8.35). Because m < n, we believe this new 
dual form will play an important role in nonconvex quadratic programming 
theory. 


8.4 Complementary Variational Principles in 
Continuum Mechanics 


This section presents two simple applications of the canonical Lagrange du- 
ality theory in continuum mechanics. The first application shows the connec- 
tion between the mathematical theory of saddle-Lagrangian duality and the 
complementary energy variational principles in static linear elasticity, which 
are well known in solid mechanics and computational mechanics. Indeed, the 
application of the super-Lagrangian duality theory to convex Hamiltonian 
systems may bring some important insights into extremality conditions in 
dynamic systems. 


8.4.1 Linear Elasticity 


Let us consider an elastic material in R* occupying a simple connected domain 
2 Cc R?® with boundary Pr = 0Q = 1, UT, such that 1,1, = @. On I, 
the boundary displacement & is given, whereas on I}, a surface traction t is 
prescribed. Suppose that the elastic body is subjected to a distributed force 
field f. The equilibrium equation Au = f has the following form, 
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O Our (x 
ty (Ds us ) =fi(x), VeeQ, (8.39) 
where D = {Dyn} (i,9,k,1 = 1,2,3) is a positive definite fourth-order 
elastic tensor, satisfying Djjx. = Dyin = Deuij, and Einstein’s summation 
convention over the repeated subindices is used here. In this problem, A = 
— div D grad is an elliptic operator, A = grad is a gradient, and v = gradu 
is called the deformation gradient. Its symmetrical part is an infinitesimal 
strain tensor, denoted as € = $(Vu + (Vu)"). The dual variable v* = De 
is a stress tensor, usually denoted by o. In this infinite-dimensional system 
U = £7(2;R°) = U* and V = £7(2;R3*?) = V*. The bilinear forms are 
defined by 


(uf) = fw tao, (c0)= f ero, 

Q Q 

where € : o = tr(€-o@) = &j;0;;. The adjoint operator A* in this case is 
A* = {-div in 2, n- on I}, and — div is also called the formal adjoint of 
A = grad. Let 


Ua = {wEeU | u(x) = ti(x), Va € I} 
Vz = {e €V| e(x) = eT (x), Va € 0}. 


Thus on the feasible space, that is, the so-called statically admissible space 
U;, = {u € Ug | Au € Va}, the quadratic form 


P(u) = a 5(Vu) :D: (Vu) dQ — [om tar (8.40) 


is the so-called total potential of the deformed elastic body. The minimal 
potential principle leads to the convex variational problem 


min{P(u) : wel}. (8.41) 


The functional V(e) = $(€; De) is call the internal (or stored) potential. Its 
Legendre conjugate 


1 
V*(o) = {(e,a) —U(e)|o =D: €} =a 57: Dp sed) 
Q 
is known as the complementary energy in solid mechanics. Because 


uu)= fu-fans f wieaE 


is linear, which is also called the external potential, the Lagrangian associated 
with the total potential P(w), as given by 
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1 - 
Lue) = f ((Wu)ie- 50: D+ ajan— | u-tdl, (8.42) 
2 Tu 


can be considered as a saddle Lagrangian, which is the well-known generalized 
Hellinger—Reissner complementary energy. Thus, by the saddle Lagrangian 
duality, the dual functional P4(c) is defined by 


P4(¢) = min L(u,0) = U’(A*o) — V*(o), 


ucla 


where 


U(r) = ming [ (u)ioaa— fu tan ff wtarl 


_ffpa@-o-ndP if -divo=0 inf, o-n=tonk, 
~~ | —00 otherwise. 


Thus, on the dual feasible space, that is, the so-called statically admissible 
space defined by 


Vi ={oe Ve | -divco=0 inf, o-n=t on J}}, 


the dual problem for this linear elasticity case is given by 


- [ 5a: D040 : cevih, (8.43) 
Q 


This is a concave maximization problem with linear constraints. The La- 
grange multiplier u for the equilibrium constraints is the solution of the pri- 
mal problem. 

In continuum mechanics, the functional —P?%, denoted by 


Pe(a) = | s7:D i040 f u-o-ndl, 
22 D, 


is called the total complementary energy. Thus, instead of the dual problem 
(8.43), the minimum complementary variational problem 


min{P°(a) : aE Vz} 


has been extensively studied by engineers, which serves as a foundation for 
the so-called stress, or equilibrium, finite element methods. 
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8.4.2 Convex Hamiltonian Systems 


Recall the mass-spring dynamical system discussed in Section 8.2, where the 
total action is a d.c. function of the form 


P(u) = V(Au) — U(u) 


=f 5mtuy)?ae— f [5hu? — us] dt. (8.44) 


The Lagrangian 


1 1 
L(u,p) = f fuer 5m 'P shu’ de [utat 


is not a saddle function, thus the Hamiltonian 
Lie dy 
= | [=m “pp? + =<ku*|dt+ | uf dt (8.45) 
rt 2 2 T 


was extensively used in classical dynamical systems. One of the main reasons 
for this could be that H(u,p) is convex. Thus, the original differential equa- 
tion Au = —mu + — ku = f can be written in the well-known Hamiltonian 
canonical form: 


Au=6,H(u,p), A*p=6,H(u,p). (8.46) 


However, an important phenomenon has been hiding in the shadow of this 
convex Hamiltonian for centuries. Because L(u, p) is a super-Lagrangian, the 
dual action can be formulated as 


P(p) = max L(u,p) 


= >| kp. — fPat—5 | mp dt, 
2/r 2 Ir 


which is also a d.c. functional. The biduality theory 
min P(u) = min P%(p), 


max P(u) = max P“(p) 


shows that the well-known least action principle in periodic dynamical sys- 
tems is actually a misnomer; that is, the periodic solution u(t) does not mini- 
mize the total action P(u), which could be either a minimizer or a maximizer, 
depending on the time period (see Gao, 2000a). 
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8.5 Nonconvex Problems with Double-Well Energy 


We now turn our attention to duality theory in nonconvex systems by con- 
sidering a very simple problem in R”: 
* 1 1 2 : n 
(Pw): min 4 P(wu) = he 5|Bul —A) -—(u,f) : wER"?}, (847) 
where B € R™*” is a matrix, a, \ > 0 are positive constants, and |v| denotes 


the Euclidean norm of v. The criticality condition 6P(w) = 0 leads to a 
coupled nonlinear algebraic system in R”: 


a Gis _ r) BT Bu=f. (8.48) 


Clearly, it is difficult to solve this nonlinear system by direct methods. Also, 
due to the nonconvexity of P(u), any solution to this nonlinear system satisfies 
only a necessary condition. The nonconvex function W(v) = $a($|v|? — A)? 
is a so-called double-well energy, which was first studied by van der Waals 
in fluid mechanics in 1895 (see Rowlinson, 1979). For each given parameter 
A > 0, W(v) has two minimizers and one local maximizer (see Figure 8.2a). 
The global and local minimizers depend on the input f (see Figure 8.2b). This 
double-well function has extensive applications in mathematical physics. In 
phase transitions of shape memory alloys, or in the mathematical theory of 
superconductivity, W(v) is the well-known Landau second-order free energy, 
and each of its local minimizers represents a possible phase state of the ma- 
terial. In quantum mechanics, if v represents the Higgs’ field strength, then 
W(v) is the energy. It was discovered in the context of postbuckling analysis 
of large deformed beam models, that the total potential is also a double-well 
energy (see Gao, 2000d), and each potential well represents a possible buck- 
led beam state. More examples can be found in a recent review article (Gao, 
2003b). 


(a) Graph of W(u) = 3 (su? — x)? (b) Graphs of P(w) = W(u) — fu 


Fig. 8.2 Double-well energy and nonconvex potential functions. 
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8.5.1 Classical Lagrangian and Duality Gap 


If we choose A = B as a linear operator, the primal function can be written 
in the traditional form P(u) = W(Bu) — U(u), where U(u) = (u, f) is a 
linear function. Because the duality relation v* = dW(v) = a($|v|? — Aju is 
not one-to-one, the Legendre conjugate 


W*(v*) = sta{(v,u*) —-W(v) : vER™} 


is not uniquely defined. Thus, the entity (v,v*) associated with the non- 
convex function W(v) is not a canonical dual pair. By using the Fenchel 
transformation 


W#(u*) = max{(v,v*) —W(v) : ve€R™}, 


the traditional Lagrangian (associated with the linear operator A = B ) can 
still be defined as 


L(u,v*) = (Bu, v*) — W#(v*) — (u, f). (8.49) 


Thus, the classical Lagrangian duality theory P#(v*) = max, L(u,v*) leads 
to the well-known Fenchel—Rockafellar dual problem 

(PP) : max, {P#(v*) =—W'(u*) : Blu* = f}. (8.50) 
This is a linearly constrained concave maximization problem. The Lagrange 


multiplier for the linear constraint set is u. However, due to the nonconvexity 
of W(v), the Fenchel-Young inequality 


W(v) + W#w") < (v, 0") 
leads to a weak duality relation 
min P > max P*. 


The nonzero value @ = min P(w) — max P*(v*) is called the duality gap. This 
duality gap shows that the classical Lagrange multiplier u may not be a solu- 
tion to the primal problem. Thus, the Fenchel-Rockafellar duality theory can 
be used mainly for solving convex problems. In order to eliminate this duality 
gap, many modified Lagrangian dualities have been proposed during recent 
years (see, for examples, Aubin and Ekeland, 1976, Rubinov et al., 2001, 
2003, Goh and Yang, 2002, Huang and Yang, 2003, Zhou and Yang, 2004). 
Most of these mathematical approaches are based on penalization of a class 
of augmented Lagrangian functions. On the other hand, the canonical dual- 
ity theory addressed in the next section is based on a fundamental truth in 
physics; that is, physical variables appear in (canonical) pairs. The one-to-one 
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canonical duality relation leads to a perfect duality theory in mathematical 
physics and global optimization. 


8.5.2 Canonical Dual Transformation and Triality 
Theory 


In order to recover the duality gap, a canonical duality theory was developed 
during the last 15 years: first in nonconvex mechanics and analysis (see Gao 
and Strang, 1989a,b, Gao, 1997, 1998a, 2000a), then in global optimization 
(see Gao, 2000a,c, 2003a, 2004b). The key idea of this theory is to choose a 
right operator (usually nonlinear) € = A(w) such that the nonconvex function 
W(u) can be written in the canonical form 


where V(€) is a canonical function of € = A(u). For the present nonconvex 
problem (8.47), instead of A = B, we choose 


€= A(u) = 5|Bul, (8.51) 


which is a quadratic map from U/ = R” into V, = {§ € R| € > 0}. Thus, the 
canonical function 


Vie) = ef - 9? 


is simply a scale-valued quadratic function well defined on V,, which leads to 
a linear duality relation 


¢ = V(6) = al€ - 2). 


Let Vs = {¢ € R| ¢ > —a)} be the range of this duality mapping. So (€,¢) 
forms a canonical duality pair on V, x Vi, and the Legendre conjugate V* is 
also a quadratic function: 


Yig= sta { (6) — sag —i)? : ée va = sorts? + rc. 


Thus, replacing W(u) = V(A(u)) = (A(u);¢) —V*(c) in P(u) = W(u)—U(u), 
the so-called total complementary function (Gao and Strang, 1989a, Gao, 
2000a) can be defined by 


B(u,s) = (A(u) ; s) — V"(s) — U(u) 


1 1 
= 5|Bul’s _ oS =e? (8.52) 
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The criticality condition 6=(u,¢) = 0 leads to the following canonical equi- 
librium equations. 


(5|BuP —A)=a''s, (8.53) 
sBT Bu=f. (8.54) 


Equation (8.53) is actually the inverse duality relation € = 6V*(c¢), which 
is equivalent to ¢ = a($|Bul? — A). Thus, equation (8.54) is identical to 
the Euler equation (8.48). This shows that the critical point of the total 
complementary function is also a critical point of the primal problem. For a 
fixed ¢ 4 0, solving (8.54) for u gives 


u= =(BT Bf. (8.55) 


Substituting this result into the total complementary function leads to the 
canonical dual function 


1 1 
PMS) = — xf (BEB) TF — As — 07's”, (8.56) 
which is well defined on the dual feasible space given by 
Vi={oeEVF| ¢ FO} ={oER]| 6 > -arA, ¢ 4 Of. 


The criticality condition 6P4(¢) = 0 gives the canonical dual algebraic equa- 
tion: 
267(a-1¢ +A) = fF (BT B)'f. (8.57) 


Theorem 8.5. (Gao, 2000c) For any given parameters a, \ > 0, and vec- 
tor f € R”, the canonical dual function (8.56) has at most three critical points 
G& (4 = 1,2,3) satisfying 

q>0>Q>e. (8.58) 


For each of these roots, the vector 
ma=(B By fie, fort=1,2,3, (8.59) 


is a critical point of the nonconvex function P(u) in Problem (8.47), and we 
have 


Pa; = PAG), vi=1,2,3. (8.60) 


The original version of this theorem was first discovered in a postbifur- 
cation problem of a large deformed beam model in 1997 (Gao, 1997), which 
shows that there is no duality gap between the nonconvex function P(u) and 
its canonical dual P4(¢). The dual algebraic equation (8.57) can be solved ex- 
actly to obtain all critical points, therefore the vector {t;} defined by (8.59) 
yields a complete set of solutions to the nonlinear algebraic system (8.48). 
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Fig. 8.3 Graph of the dual algebraic equation (8.57) and a geometrical proof of the triality 
theorem. 


Let 7? = f7(B? B)~'f. In algebraic geometry, the graph of the algebraic 
equation T? = 2¢?(a~1¢ +4) is the so-called singular algebraic curve in (¢,7)- 
space (i.e., the point ¢ = 0 is on the curve; cf. Silverman and Tate, 1992). 
From this algebraic curve, we can see that there exists a constant 7 such that 
if r? > 72, the dual algebraic equation (8.57) has a unique solution ¢ > 0. It 
has three real solutions if and only if r? < 72. 

It is interesting to note that for ¢ > 0, the total complementary function 
&(u,¢) is a saddle function and the well-known saddle min-max theory leads 
to 

min max ='(u, ¢) =2(u,5) = max min = (u, ae (8.61) 
This means that % is a global minimizer of P(w) and & is a global maximizer 
on the open domain ¢ > 0. However, for ¢ < 0, the total complementary 
function =(u,¢) is concave in both u and ¢ < 0; that is, it is a supercritical 
function. Thus, by the biduality theory, we have that either 


min max & (u,s) = 5(%,¢) = min max 5 (u,¢) (8.62) 


holds on a neighborhood of (u,¢), or 


max max = (tu, S$) = (4,9) = max max 5 (u,¢). (8.63) 
Actually, the extremality conditions can be easily viewed through the graph 
of P4(c) (see Figure 8.4). To compare with this canonical dual function, the 
graph of P(u) for n = 1 is also shown in Figure 8.4. Precisely, we have the 
following result (see Gao, 2000a,b). 
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ee -2 =i 0) ab 2 3 


Fig. 8.4 Graphs of P(x) (dashed) and P4%(¢) (solid) for n = 1. 


Theorem 8.6. (Complete Solutions for Problem (P,,) (Gao, 1998a, 
2000a)) For certain given parameters a, > 0, and the vector f € R”, 
if T? > 72 = 8a7\3/27, then the canonical dual function P4(s) has only 
one critical point ¢ > 0, which is a global mazimizer of P4(s), and % = 
(B? B)~'f/¢ is a global minimizer of P(u). 

If 7? < 72, the canonical dual function P4(s) has three critical points 
4 >0>&>& such that t, is a global minimizer, U2 is a local minimizer, 
and tz is a local maximizer of P(u). 


8.5.38 Canonical Dual Solutions to Nonconvez 
Variational Problems 


Similar to the nonconvex optimization problem (8.47) with the double-well 
function, let us now consider the following typical nonconvex variational prob- 
lem, 


©): mip Pe) = [30 (Seta) ae fu on} 860 


where f(x) is a given function, \ > 0 is a parameter, and 
U;, = {u € £0, 1]| uw’ € £40, 1], u(0) = 0} 


is an admissible space. Compared with Problem (8.47), we see that the lin- 
ear operator B in this case is a differential operator d/dz. This variational 
problem appears frequently in association with phase transitions in fluids 
and solids, and in postbuckling analysis of large deformed structures. The 
criticality condition 6P(u) = 0 leads to a nonlinear differential equation in 
the domain (0,1) with the natural boundary condition at « = 1; that is, 
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Fig. 8.5 Zigzag function: Solution to the nonlinear boundary value problem (8.65). 


low (5x? - a)| +f(x)=0, Vee (0,1), (8.65) 
au! (50 - r) =0 atr=1. (8.66) 


Due to its nonlinearity, a solution to this boundary value problem is not 
unique. Particularly, if we let f(x) = 0, the equation (8.65) could have 
three real roots u/(a) = {0,+/2X}. Thus, any zigzag curve u(x) with slope 
{0,+V2)} solves the boundary value problem, but may not be a global min- 
imizer of the total energy P(wu). This problem shows an important fact that 
in nonconvex analysis the criticality condition is only necessary, but not suff- 
cient for solving variational problems. Traditional direct approaches for solv- 
ing nonconvex variational problems are very difficult, or impossible. However, 
by using the canonical dual transformation, this problem can be solved com- 
pletely. To see this, we introduce a new “strain measure” 


= Au) = Su”, 


such that the canonical functional 
ae 
ve@=f Zae-»? ax 
(0) 2 
is convex on VY, = {€ € L7[0,1] | E(x) > 0 Va € (0,1)}, and the duality 


relation ¢ = 6V(€) = a(€ — A) is one-to-one. Thus, its Legendre conjugate 
can be simply obtained as 


v"(s) =sta{ f & da—V(g) : ceva 


ay al 
=| (Gore +2) daz. 
0 \2 
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Similar to (8.52), the total complementary function is 


E(u,¢) =f (Sus sus rs) ac— [ uf dx. (8.67) 
0 0 


For a given ¢ # 0, the canonical dual functional can be obtained as 


1 2 1 
P%(c) = sta{S(u,s) : wel} = -{ (= +As+ 50-1) dz, (8.68) 
0 
where 7(2) is defined by 
T= -{ f(x) dx+e, (8.69) 
0 


and the integral constant c depends on the boundary condition. The criticality 
condition 6P4(¢) = 0 leads to the dual equilibrium equation 


267(a4*¢ +A) = 7”. (8.70) 


This algebraic equation is the same as (8.57), which can be solved analytically 
as stated below. 


Theorem 8.7. (Analytical Solutions and Triality Theorem (Gao, 
1998a, 2000b)) For any given input function f(x) such that T(x) is de- 
fined by (8.69), the dual algebraic equation (8.70) has at most three real roots 
gj (4 = 1,2,3) satisfying 


a (2) >0> (x) > (x). 


For each ¢;, the function 
x 
= 
ti;(x) aij — dx (8.71) 
0 Si 
is a critical point of the variational problem (8.64). Moreover, U(x) is a 
global minimizer, tig(a) is a local minimizer, and U3(x) is a local maximizer; 
that is, 


P(t) = min max =(u,¢) = max min 5(u,¢) = P4(q); (8.72) 
us s>0 s>O0 wu 
P(iz) =min max Z(u,s) = min max 5(u,s) = P4(&); (8.73) 
u ¢€(s,0) se(o3,0)  u 
P(t3) = max max 5(u,s) = max max 5(u,¢) = P4(&). (8.74) 
Ui S<o2 S<oca lu 


As a complete theory, the triality theorem was first discovered in post- 
buckling analysis of large deformed elastic beam models (Gao, 1997). The 
biduality theory was developed two years later during the writing of the 
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monograph Gao (2000a). However, the original idea of the canonical dual 
transformation and the saddle-min-max theorem (8.72) were from the joint 
work by Gao and Strang in the study of complementary variational prob- 
lems in nonconvex/nonsmooth boundary value problems (Gao and Strang, 
1989a,b). Theorems 8.5 and 8.7 were also first proposed in the context of con- 
tinuum mechanics (see Section 8.7, and Gao, 1999a,c, 2000b, Li and Gupta, 
2006). 


8.6 Canonical Duality Theory in General Nonconvex 
Systems 


In this section, we discuss the canonical dual transformation and its associ- 
ated triality theory for solving the following general nonconvex problem 


(P): min{P(u) =W(u)—-U(u) : Vue}, (8.75) 


where W(u) is a general nonconvex function on an open set Uz CU, U : 
Uu, — R is a Gateaux differentiable function, either linear or canonical, and 
Ui C Ug is a feasible space. The canonical dual transformation for solving 
more general problems can be found in Gao (1998a, 2000a,c). 


8.6.1 Canonical Dual Transformation and Framework 


The key idea of the canonical dual transformation is to choose a Gateaux 
differentiable geometrical operator € = A(u) : Ua — V_ and a canonical 
function V(€) : Vz — R such that the nonconvex function W(u) can be 
written as 


W(u) = V(A(u)). (8.76) 


Because V(€) is a canonical function on Y,, its Legendre conjugate can be 
defined uniquely on VF Cc V* by 


V*(c) =sta{(é,>) -V(E)_ : VEE Va}, (8.77) 
and on V, x Vs, we have 
s=OV(E) & E=6V"(s) & (E35) =V(E)+V"(). (8.78) 


Replacing W(u) by V(A(u)) and letting Uy, = {u € Uz | A(u) € Va}, the 
primal problem (P) can be written in the canonical form: 


(P): min{P(u) =V(A(u)) —U(u) : Vu € Ug}. (8.79) 
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Because A(w) is Gateaux differentiable, by the chain rule we have dV (A(u)) = 
A;(u)deV(A(w)), where A;(w) is the Gateaux derivative of A(w) and d¢V(A(u)) 
represents the Gateaux derivative of V with respective to € = A(u). Its ad- 
joint Af(u) is defined by 


(Ar(u)u 5 s) = (u, Ar (us). 


Thus, the criticality condition 6P(u) = 0 leads to the canonical equilibrium 
equation 


A*(u)5gV(A(u)) — 6U(u) = 0. (8.80) 


In terms of the canonical duality pair (€,¢), the canonical equilibrium equa- 
tion (8.80) can be written in the tricanonical forms: 


(a) Geometrical equation: A(u) = €. 
(b) Constitutive equation: dV(€) =<c. (8.81) 
(c) Balance equation: Aj (u)o = dU (u). 


In many applications, where the function U(u) is usually linear on Uz, the 
nonlinearity of the problem (P) mainly depends on A and V. In this case, 
the nonlinearities of the general nonconvex problem can be classified by the 
following definition (Gao, 2000a). 


Definition 8.4. (Nonlinearity Classification) The problem (P) is said to 
be geometrically nonlinear if the operator A(u) is nonlinear, physically non- 
linear if the constitutive relation ¢ = dV(&) is nonlinear, and fully nonlinear 
if it is both geometrically and physically nonlinear. 


Generally speaking, the nonconvexity of P(w) is mainly due to the geo- 
metrical nonlinearity. For a nonlinear operator A(u), the following operator 
decomposition introduced by Gao and Strang (1989a) plays an important 
role in canonical duality theory, 


A(u) = A;(u)u + Ac(u), (8.82) 


where A, = A(u) — A;(u)u is the so-called complementary operator of Aj. 
By this decomposition (8.82), Gao and Strang discovered in the case where 
U(u) is a linear function, that the duality gap existing in classical Lagrangian 
duality theory can be naturally recovered by the so-called complementary gap 
function defined by 

Ge(u,¢) = —(Ac(w) ; s). (8.83) 


The diagrammatic representation for a fully nonlinear canonical system is 
given in Figure 8.6. 

Based on the canonical form of the primal problem (8.79), the total com- 
plementary function = :U, x Vi — R can be formulated as 


E(u,s) = (A(u);s) — V*(o) — U(u), (8.84) 
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u€Ua CU ~— (u, u*) —+Y* DU > u* 
sean] [e-a-a 
EEVaC V ~— (€; 56) — + Vt DVES6 


Fig. 8.6 Diagrammatic representation in fully nonlinear systems. 


which is also called the generalized complementary energy in nonconvex vari- 
ational problems and continuum mechanics (Gao and Strang, 1989a, Gao, 
2000a), or the nonlinear Lagrangian in global optimization (Gao, 2000c). For 
each fixed u € Ug, the mapping =(u,-) : Vi — R is a canonical function. 
However, the property of the mapping =(-,¢) : Ua — R will depend on the 
geometrical operator A(u). Therefore, for a given ¢ € V*, we introduce a new 
(parametric) function 


G.(u) := (A(u); 6) —U(u), Vue. (8.85) 
Clearly, for a fixed ¢ € V*, the criticality condition 
6G. (t; u) = (A; (t)u; ¢) —dU(a;u) =0, Vuela 


leads to the balance equation Af (t)>s — d6U(a) = 0. This function plays an im- 
portant role in canonical duality theory. By introducing the so-called canon- 
ical dual feasible space V defined by 


Vi={o EVE | AF (u)o = 6U(u), Vue Ugh, (8.86) 
the canonical dual function P¢ : Vf — R can be formulated via =(u,¢) as 
P4(c) = sta{S(u,s) : ue Ua} = U4(6) —V*(s), (8.87) 


where U“ : Y* — R is called A-conjugate transformation of U, defined by 
(see Gao, 2000a) 


U4(cs) =sta{(A(u); s) —U(u) : wel}. (8.88) 


Theorem 8.8. (Canonical Dual Transformation (Gao, 2000a)) The 
function 
P%(s) =UA(s)—V*(s) : BR 


is canonically dual to P(w) = V(A(w)) — U(u) : Up — R in the sense that if 
(u,¢) is a critical point of 5(u,¢), then % is a critical point of P(u), ¢ is a 
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critical point of P4(<s), and 
PU = S(a,¢) = P(s): (8.89) 


This theorem can be easily proved by examining the criticality condition 
65 (u,¢) = 0, which leads to the following canonical Lagrangian equations, 


A(u) = 6V*(¢), = Af (u)e = OU (wu), (8.90) 


which are equivalent to the tricanonical forms (8.81) because V*(¢) is a canon- 
ical function on V*. Thus, @ is a critical point of P(u). By the definition of 
the canonical dual function, ¢ is also a critical point of P4(<). 


Theorem 8.8 shows that there is no duality gap between the primal func- 
tion and its canonical dual. Actually, in the case where U(u) = (u, f) is a 
linear function, we have 


UA (<) = Go(u) = (A-(u) ; ¢) = —G-(u,¢) s.t. Af(u)s = f; 


that is, the duality gap is recovered by the complementary gap function 
G_(u,¢). In this case, the function P°(u,¢) = —S(u,¢) defined by 


P°(u,s) = G.(u,s) + V*(s) (8.91) 


is the total complementary energy introduced by Gao and Strang in 1989 
(1989a). They proved that if (u,¢) is a critical point of P°(u,¢), then @ is 
a critical point of P(u), and P(u) + P°(t,¢) = 0. The operator A(u) is 
usually nonlinear in nonconvex problems, therefore the explicit format of the 
canonical dual function P4(¢) will depend on the properties of the function 
G.(u). By the implicit function theory, if A(u) is twice Gateaux differentiable 
and the second Gateaux differential 


62G.(u; du?) £0 Vou, (8.92) 


then U4(cs) can be formulated explicitly by the A-conjugate transformation 
(8.88). Some simple illustrative examples are given below. 


Example 8.1. Recall the nonconvex optimization problem with the double 
well function (8.47): 


min{P(u) = 50(5| Bul? — r)? —(u, f) : weR"}, 


where W(u) is a double-well function and U(w) is a linear function. If we 
choose € = A(u) = $|Bul? as a quadratic operator, then we have A;(u) = 
(Bu)"B and A,(u) = A(u) — A;(u) = —$|Bul?. Because for each ¢ # 0, 
G.(u) = $|Bul?c — (u, f) is a quadratic function and 6?G,(u) = ¢, the A- 
conjugate U4 is well defined by 
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1 1 
UM) =sta{ (51BUP% 6) (uf) 2 wer" b=—s pray yy 
S 
The complementary gap function in this case is G-(u,¢) = $|Bul?s. Clearly, 
for any u € R" and u £0, G.(u,¢) > 0 if and only if ¢ > 0. Thus, the total 
complementary function =(u,¢) given by (8.52) is a saddle function for ¢ > 0. 
This leads to the saddle min-max duality (8.61) in the triality theory. 


Example 8.2. In the nonconvex variational problem (8.64), the quadratic 
differential operator € = A(u) = su? has a physical meaning. In finite de- 
formation theory, if w is considered as the displacement of a deformed body, 
then € can be considered as a Cauchy—Green strain measure (see the following 
section). The Gateaux derivative of the quadratic differential operator A(u) 
is A,(u) = u’d/da. For any given u € U,, using integration by parts, we get 


1 1 
(A,(u)us<} = | uc dx = wuls[Z23 — | ive aaa, aS: 
0 0 
which gives the adjoint operator Aj via 


u's onxz=1 


ge { [u’s]’, Va € (0,1). 
For any given ¢ € Va, the A-conjugate transformation 
1 
U“(s) =sta{(A(u),s) —U(u) : wel} = -{ 767! da. 
0 


The complementary operator in this problem is A.(u) = A(w) — A:(u)u = 


—gu?, which leads to the complementary gap function 


1 
1 
G-.(u,¢) =i xls da. 
0 


Clearly, this is positive if ¢ > 0. 


8.6.2 Extremality Conditions: Triality Theory 


In order to study the extremality conditions of the nonconvex problem, we 
need to clarify the convexity of the canonical function V(&). Without loss of 
generality, we assume that V : VY, — R is convex. Thus, for each u € Ug, the 
total complementary function 


E(u,s) = (A(u) 3s) —V"(s) —U(u) : Va > R 
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is concave in ¢ € Vx. The convexity of 5(-,¢) : Ua — R will depend on the 
geometrical operator A(u) and the function U(u). We furthermore assume 
that the function G.(u) = (A(u) ; ¢) — U(u) : Ua — R is twice Gateaux 
differentiable on U, and let 


G := {(u,s) €Ua x VE | 6°G.(u; du?) 40, Vou # 0}, (8.93) 

Gt := {(u,¢s) € Ug x V* | 6?G.(u; du?) > 0, Vou 4 O}, (8.94) 

G~ := {(u,¢s) €Ug x V* | 6°G.(u; du?) <0, Vou A 0}. (8.95) 
Theorem 8.9. (Triality Theorem) Suppose that (t,¢) € G is a critical 


point of S(u,s) and U, x V3 CUR x Vi is a neighborhood of (t, ¢). 
If (u,0) €G*, then (u,¢) is a saddle point of Z(u,¢); that is, 


oe se 6) = £(4,9 = max min 5(u,s). (8.96) 


If (t,¢) € G—, then (U,¢) is a supercritical point of E(u,¢), and we have that 
either 


inn nee (u,s) = S(%,¢) = ea max 5 (u,¢) (8.97) 
holds, or 
max eens) = 2 (u,¢) = max max &'(u, ey: (8.98) 


Proof. By the assumption on the canonical function V(€), we know that 
=(u,¢) is concave on V*. Because G.(u) is twice Gateaux differentiable on 
Uz, the theory of implicit functions tells us that if (tu, ¢) € G, then there exists 
a unique u € Uo C Ux such that the dual feasible set VX is nonempty. If such 
a point (u,¢) € Gt, then G.(u) is convex in u and (v,¢) is a saddle point of 
= on U, x Vx. The saddle-Lagrangian duality leads to (8.96). If (u,¢) € G~, 
then G.(u) is locally concave in u and (@,¢) is a supercritical point of =(u,¢) 
on U, x V%. In this case the biduality theory leads to (8.97) and (8.98). 


If the geometrical operator A(u) is a quadratic function and U(u) is either 
quadratic or linear, then the second-order Gateaux derivative 6°G,(u) does 
not depend on wu. In this case, we let 


Vi := {5 € VE | 6’G.(u) is positive definite}, (8.99) 
v* :={¢€ V* | 6°G.(u) is negative definite}. (8.100) 


The following theorem provides extremality criteria for critical points of 
E(u,s). 


Theorem 8.10. (Triduality Theorem (Gao, 1998a, 2000a)) Suppose 
that G.(u) = (A(u); s) —U(u) is a quadratic function of u € Ug and (ti, ¢) is 
a critical point of E(u,¢). 

IfFEV*%, then & is a global minimizer of P(w) on Uy if and only if ¢ is a 
global maximizer of P4(s) on Vi, and 
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P(w) = min P(u) = max P4(s) = P4(6). (8.101) 


ucuy, seve 


If>eV*, then on the neighborhood U, x V* CU, x Vx of (t,¢), we have that 
either 


P(@) = min P(u) = min P4(c) = P4(é 102 
(u) ate (u) cvs (s) (s) e ) 
holds, or 
P(a) - P = P4(c) = P4(a), wll 
(@) uelle (u) seve (s) (s) ons) 


This theorem shows that the canonical dual solution ¢ € Vj} provides a 
global optimality condition for the nonconvex primal problem, whereas the 
condition ¢ € Y* provides local extremality conditions. 

The triality theory was originally discovered in nonconvex mechanics (Gao, 
1997, 1999c). Since then, several modified versions have been proposed in 
nonconvex parametrical variational problems (for quadratic A(w) and lin- 
ear U(u) (Gao, 1998a)), general nonconvex systems (for nonlinear A(u) 
and linear U(u) (Gao, 2000a)), global optimization (for general nonconvex 
functions of type ®(u, A(u)) (Gao, 2000c), quadratic U(u) (Gao, 2003a,b)), 
and dissipative Hamiltonian system (for nonconvex/nonsmooth functions of 
type ®(u,u,z,A(w)) (Gao, 2001c)). In terms of the parametrical function 
G.(u) = (A(u);s) —U(u), the current version (Theorems 8.9 and 8.10) can be 
used for solving general nonconvex problem (8.75) with the canonical function 
U(u). 


8.6.3 Complementary Variational Principles in Finite 
Deformation Theory 


In finite deformation theory, the deformation u(x) is a smooth, vector-valued 
mapping from an open, simply connected, and bounded domain 2 C R” into 
a deformed domain? w C R™. Let PF = 0Q =I,,U I; be the boundary of 
2 such that on I,, the boundary condition u(x) = @ is prescribed, whereas 
on the remaining boundary I}, the surface traction (external force) t(x) is 
applied. Similar to the nonconvex optimization problem (8.48), the primal 
problem is to minimize the total potential energy functional: 


min{ P(u) = f pv(va) —u-#] a0— | ustde + wa ond) >; 


ry 
(8.104) 
where the stored energy W(F) is a Gateaux differentiable function of F = Vu, 
and f(x) is a given force field. Because the deformation gradient F = Vu € 


2 If m = n+1, then the deformation u(x) represents a hypersurface in m-dimensional 
space. Applications of the canonical duality theory in differential geometry were discussed 
in Gao and Yang (1995). 
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R”"*™ is a so-called two-point tensor, which is no longer a strain measure 
in finite deformation theory, the stored energy W(F) is usually nonconvex. 
Particularly, for St. Venant—Kirchhoff material (see Gao, 2000a), we have 


1/1 1 
WOi=5 [5078 - n| ae [5@7F - n| ; (8.105) 
where J is an identity tensor in R"*”. Due to nonconvexity, the duality 
relation 
T=dW(F) 


is not one-to-one. Although the two-point tensor 7 € R™%*” is called the 
first Piola—Kirchhoff stress, according to Hill’s constitutive theory, (F,7) is 
not considered as a work-conjugate (canonical) strain-stress pair (see Gao, 
2000a). The Fenchel—Rockafellar type dual variational problem is 


max {Pir = f a-r-nar— [ wiraal (8.106) 
Tp Q 
st. -V-r =f in, ner? =t onl. (8.107) 


In the case where the stored energy W(F) is convex, then W*(r) = W*(r) 
which is called the complementary energy in elasticity. In this case, the func- 
tional 


m(r) = f wean f a-r-nar 


is the well-known Levinson-Zubov complementary energy. As discussed be- 
fore, if the stored energy W(F) is nonconvex, the Legendre conjugate W* is 
not uniquely defined. It turns out that the Levinson—Zubov complementary 
variational principle can be used only for solving convex problems (see Gao, 
1992). Although the Fenchel conjugate W#(7) can be uniquely defined, the 
Fenchel-Young inequality W(F) + W*(r) > (F;7) leads to a duality gap 
between the minimal potential variational problem (8.104) and its Fenchel— 
Rockafellar dual (see Gao, 1992); that is, in general, 


min P(u) > max P*(r). (8.108) 


By the fact that the criticality condition 6P# (7) = 0 is not equivalent to the 
primal variational problem and the weak duality is not appreciated in the 
field of continuum mechanics, the existence of a perfect (i.e., without a dual- 
ity gap), pure (i.e., involving only stress tensor as variational argument) com- 
plementary variational principle in finite elasticity has been argued among 
well-known scientists for more than three decades (see Hellinger, 1914, Hill, 
1978, Koiter, 1973, 1976, Lee and Shield, 1980a,b, Levinson, 1965, Ogden, 
1975, 1977, Zubov, 1970). This problem was finally solved by the canonical 
dual transformation and triality theory in Gao (1992, 1999c). 
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Similar to the quadratic operator A(u) = 4|Bul? (see equation (8.51)) 
chosen for the nonconvex optimization problem (8.48), we let 


E = A(u) = 5l(Vu)" (Wu) — I], (8.109) 


which is a symmetrical tensor field in R”*”. In finite deformation theory, 
E is the well-known Green-St. Venant strain tensor. Thus, in terms of E, 
the stored energy for St. Venant—Kirchhoff material can be written in the 
canonical form W(Vu) = V(A(Vu)), and 


V(B)=5E:D:E 


is a (quadratic) convex function of the symmetrical tensor E € R"*”. The 
canonical dual variable E* = d6V(E) = D-E is called the second Piola 
Kirchhoff stress tensor, denoted as T. The Legendre conjugate 


V(r) = st ne 8 aera & (8.110) 


is also a quadratic function. Let Uz = {u € W)?(Q;R) | u = won I} 
(where W!? is a standard Sobolev space with p € (1,co)) and V* = 
C(Q;R"*"). Replacing W(Vu) by its canonical dual transformation 
V(A(u)) = E(u) : T — V*(T), the generalized complementary energy 
:Uq x Vi — R has the following format, 


= 
o 


5(u,T) = f (Btu): T-V*(T)- wf ag— | w-tar, (8.111) 


which is the well-known Hellinger-Reissner generalized complementary en- 
ergy in continuum mechanics. 

Furthermore, if we replace V*(T) by its bi-Legendre transformation E : 
T-—V(BE), then =(u,T) can be written as 


Fis (tt, TL, E) = | 


[A(Vu)—B): T+V(B)—u-f}ae— | weedl, (8112) 
Q 


It 


This is the well-known Hu-Washizu generalized potential energy in nonlinear 
elasticity. The Hu-Washizu variational principle has important applications 
in computational analysis of thin-walled structures, where the geometrical 
equation E = A(u) is usually proposed by certain geometrical hypothesis. 

Because A(u) is a quadratic operator, its Gateaux differential at a in the 
direction u is 6A(u;u) = A,(u)u = (Vu)? (Vu) and 


A-(u) = A(u) — A;(u)u = —5((Vu)? (Vu) +I). 
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By using the Gauss—Green theorem, the balance operator Aj(u) can be de- 


fined as _y.(vw? TI? ine 
= ie (vu)? - TI’ on. 


The complementary gap function in this problem is a quadratic functional: 


1 

G.(u, T) = (—A-(u); T) = df sirl(Vu)? -T-(Vu)+ T]dQ. (8.113) 
2 

Thus, the complementary variational problem is to find critical (stationary) 

points (u,T) such that 


= 1 
P°(a, T) = sta {| sil(Vu)? -T. (Vu) + T]dQ +f v*(T) anh (8.114) 
Q Q 
st. —V-[(Vu)?-T)? =f in Q, n-[(Vu)?- Tl? =¢t on ky. 
The following result is due to Gao and Strang in 1989 (1989a). 


Theorem 8.11. (Complementary—Dual Variational Principle (Gao 
and Strang, 1989a)) Jf (u,T) is a critical point of the complementary 
variational problem (8.114), then @ is a critical point of the total potential 
energy P(u) defined by (8.104), and 


P(a) + P°(a, T) =0. 
Moreover, if the complementary gap function 
G.(u,T)>0, Wel, (8.115) 
then @ is a global minimizer of P(u) and 
P(@) = min P(u) = max min 5(u, T) = —P*(a, Tr), (8.116) 


subject to T(x) being positive definite for allx € 2. 


This theorem shows that the positivity of the complementary gap func- 
tion G.(u, T) provides a sufficient condition for a global minimizer, and the 
equalities (8.11) and (8.116) indicate that there is no duality gap between the 
total potential P(u) and its complementary energy P‘(u, T). The physical 
significance is also clear: a finite deformed material is stable if the second 
Piola—Kirchhoff stress tensor T(x) is positive definite everywhere in the do- 
main 92. The linear operator B = V in this nonconvex variational problem 
is a partial differential operator, therefore it is difficult to find its inverse. 
It took more than ten years before the canonical dual problem was finally 
formulated in Gao (1999a,c). To see this, let us assume that for a given force 
vector field t on the boundary I;, the first Piola~Kirchhoff stress tensor T(x) 
can be defined by solving the following boundary value problem, 
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—V-ri(x)=f nQ, ner? =t onl. (8.117) 


Then the canonical dual functional P4(T) can be formulated as 
1 
P4(T) = -{ att(r - Tole? eT) an— f V*(T) dQ. (8.118) 
Q Q 


The criticality condition 6P4(T) = 0 gives the canonical dual equation 
T-(26V*(T)4+1)-T=r" - 7. (8.119) 


For St. Venant-Kirchhoff material, V*(T) = $T : D7! : T is a quadratic 
function and its Gateaux derivative 5V*(T) = D~!-T is linear. In this case, 
the canonical dual equation (8.119) is a cubic equation, which is similar to 
the dual algebraic equations (8.57) and (8.70). 


Theorem 8.12. (Pure Complementary Energy Principle (Gao, 
1999a,c)) Suppose that for a given force field t(x) on Iy, the first Piola- 
Kirchhoff stress field T(x) is defined by (8.117). Then each solution T of the 
canonical dual equation (8.119) is a critical point of P¢, the vector defined 
by the line integral 


a= [rT ax (8.120) 
is a critical point of P(u), and 
P(a) = P4(T). 


This theorem presents an analytic solution to the nonconvex potential 
variational problem (8.104). In the finite deformation theory of elasticity, 
this pure complementary variational principle is also known as the Gao prin- 
ciple (Li and Gupta, 2006), which holds also for the general canonical energy 
function V(E). Similar to Theorem 8.9, the extremality of the critical points 
can be identified by the complementary gap function. Applications of this 
pure complementary variational principle for solving nonconvex/nonsmooth 
boundary value problems are illustrated in Gao (1999c, 2000a) and Gao and 
Ogden (2008a,b). 


8.7 Applications to Semilinear Nonconvex Systems 


The canonical dual transformation and the associated triality theory can 
be used to solve many difficult problems in engineering and science. In this 
section, we present applications for solving the following nonconvex mini- 
mization problem 
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(P): min{P(u) =W(u) + 5 (u, Au) —(u,f) : wEUg}, (8.121) 


where W(u) : Uz, — R is a nonconvex function, and A: U, C U — Ue is 
a linear operator. If W(u) is Gateaux differentiable, the criticality condition 
dP(u) = 0 leads to a nonlinear Euler equation 


Au+ dW(u) = f. (8.122) 


The abstract form (8.122) of the primal problem (P) covers many situa- 
tions. In nonconvex mechanics (cf. Gao, Ogden, and Stavroulakis, 2001, Gao, 
2003b), where U is an infinite-dimensional function space, the state variable 
u(a) is a field function, and A: U — U* is usually a partial differential op- 
erator. In this case, the governing equation (8.122) is a so-called semilinear 
equation. For example, in the Landau—Ginzburg theory of superconductivity, 
A = A is the Laplacian over a given space domain 2 C R” and 


W(u) = : 50 Ca - a) dQ (8.123) 


is the Landau double-well potential, in which a,\ > 0 are material con- 
stants. Then the governing equation (8.122) leads to the well-known Landau-— 
Ginzburg equation 


1 
Au + ouu( Su" —A)=f. 


This semilinear differential equation plays an important role in material sci- 
ence and physics including: ferroelectricity, ferromagnetism, and supercon- 
ductivity. In a more complicated case where A = A + curl curl, we have 


1 
Au + curl curl u + ou( su" —-A)=f, 


which is the so-called Cahn—Hilliard equation in liquid crystal theory. Due 
to the nonconvexity of the double-well function W(u), any solution of the 
semilinear differential equation (8.122) is only a critical point of the total 
potential P(u). Traditional direct analysis and related numerical methods 
for finding the global minimizer of the nonconvex variational problem have 
proven unsuccessful to date. 

In dynamical systems, if A = —O4, + A is a wave operator over a given 
space-time domain 2 C R” x R, then (8.122) is the well-known nonlinear 
Schrédinger equation 


1 
—uit + Aut ou(su" —A)j=f. 


This equation appears in many branches of physics. It provides one of the 
simplest models of the unified field theory. It can also be found in the theory 
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(a) u(t) (b) Trajectory in phase space u-p 


(a) u(t) (b) Trajectory in phase space u-p 
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Fig. 8.7 Numerical results by ode23 (top) and ode15s (bottom) solvers in MATLAB. 


of dislocations in metals, in the theory of Josephson junctions, as well as in 
interpreting certain biological processes such as DNA dynamics. 

In the most simple case where u depends only on time, the nonlinear 
Schrédinger equation reduces to the well-known Duffing equation 


1 
Ute = ou( su" —A)-f. 


Even for this one-dimensional ordinary differential equation, an analytic solu- 
tion is still very difficult to obtain. It is known that this equation is extremely 
sensitive to the initial conditions and the input (driving force) f(t). Figure 
8.7 displays clearly that for the same given data, two Runge-Kutta solvers 
in MATLAB produce very different vibration modes and “trajectories” in 
the phase space u—p (p = u;). Mathematically speaking, due to the noncon- 
vexity of the function W(u), very small perturbations of the system’s initial 
conditions and parameters may lead the system to different local minima 
with significantly different performance characteristics, that is, the so-called 
chaotic phenomena. Numerical results vary with the methods used. This is 
one of the main reasons why traditional perturbation analysis and direct ap- 
proaches cannot successfully be applied to nonconvex systems (Gao, 2003b). 
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Numerical discretization of the nonconvex variational problem (P) in 
mathematical physics usually leads to a nonconvex optimization problem 
in finite-dimensional space U = R"”, where the field variable u is simply a 
vector x € U, the bilinear form (x, x*) = x?x* = x-x* is the dot-product of 
two vectors, and the operator A: R” — U/* = R"” is a symmetrical matrix. 
In d.c. (difference of convex functions) programming and discrete dynamical 
systems, the operator A = A’ € R”*” is usually indefinite. The problem 
(8.121) is then one of global minimization in R”. In this section, we discuss 
the canonical dual transformation method for solving this type of problem. 


8.7.1 Unconstrained Nonconvex Optimization Problem 
with Double- Well Energy 


First, let us consider an unconstrained global optimization problem in finite- 
dimensional space U = R”, where A = A? € R"™” is a matrix, and W(x) is 
a double-well function of the type W(x) = $($|x|? — A)?. Then the primal 
problem is 


2 
nin Pts) = 3 (50? -) + 5x? Axx" f : meus =m} 


(8.124) 
The necessary condition 6P(x) = 0 leads to a coupled nonlinear algebraic 
system 


Ax + (Six? - a) caf. (8.125) 


Clearly, a direct method for solving this nonlinear equation with n unknown is 
elusive. By choosing the quadratic operator € = 5|x/°, the canonical function 
V(é) = $(€ — A)? is a quadratic function. By the fact that $|x|? = € > 
0, Vx € R”, the range of the quadratic mapping A(x) is 


Va ={€ ER] € > Of. 


Thus, on V,, the canonical duality relation ¢ = 6V(€) = € — 2 is one-to-one 
and the range of the canonical dual mapping dV : V¥, — V* C Ris 


Ve = {5 €R| ¢ > —A}. 


It turns out that (€,¢) is a canonical pair on V, x V* and the Legendre 
conjugate V* is also a quadratic function: 


V*(c) =sta{és —V(é) : €€ Vuh = =e + ds. 


For a given ¢ € V7, the A-conjugate transformation 
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af di 
UA(s) = sta { 5x 5x Ax +x" f :xeE Rr} 
1 a 
=-5fT(Atsl) fF 

is well defined on the canonical dual feasible space Vz, given by 

Ve ={s ER] det(A+cl) #0, ¢ > —A}. (8.126) 
Thus, the canonical dual problem can be proposed as the following (Gao, 
2003a): 


(P24): max | Pale) = 5/7 (A+ SI — 56? As ; cevjh. 


(8.127) 
This is a nonlinear programming problem with only one variable! The criti- 
cality condition of this dual problem leads to the dual algebraic equation 


s+A= siT(At sl). (8.128) 


For any given A € R"*” and f € R”, this equation can be solved by Math- 
ematica. Extremality conditions of these dual solutions can be identified by 
the following theorem (see Gao, 2003a). 


Theorem 8.13. (Gao, 2003a) /f the matrix A has r distinct nonzero eigen- 
values such that 
ay < ag <°++ < Gr, 


then the canonical dual algebraic equation (8.128) has at most 2r +1 roots 
1 > 92 263 2 S Sardi. 

For each ¢;, the vector 

x,=(A+ql)“f, Wi=1,2,...,2r+1, (8.129) 
is a solution to the semilinear algebraic equation (8.125) and 

P(x;) = P%(q), Vi=1,...,2r +1. (8.130) 

Particularly, the canonical dual problem has at most one global maximizer 
c1 > —a, in the open interval (—a,,+00), and x; is a global minimizer of 
P(x) over U;; that is, 


P(x,) = min P(x) = max P%4(s) = P%(c1). (8.131) 


xeUpn o>-a1 
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\ 


Fig. 8.9 Graph of the dual function P4(c). 


Moreover, in each open interval (—aj41,—a;), the canonical dual equation 
(8.128) has at most two real roots —aj41 < Si41 < Ci < —Qi, Vi = 1,..., 27+ 
1, sa; is a local minimizer of P’, and ¢2:41 is a local maximizer of P4(c). 


As an example in two-dimensional space, which is illustrated in Figure 8.8, 
we simply choose A = {a;;} with a1; = 0.6, ae2 = —0.5, aig = agi = 0, 
and f = {0.2,—0.1}. For a given parameter \ = 1.5, and a = 1.0, the graph 
of P(x) is a nonconvex surface (see Figure 8.8a) with four potential wells 
and one local maximizer. The graph of the canonical dual function P4(¢) is 
shown in Fig. 8.9. The dual canonical dual algebraic equation (8.128) has a 
total of five real roots: 


& = —1.47 < % = —0.77 < ¢3 = —0.46 < G@ = 0.45 < G = 0.55, 


and we have 
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Fig. 8.10 Graph of the dual function P%(c) for a four-dimensional problem. 


P4(&) = 1.15 > P?(&) = 0.98 > P*(q&) = 0.44 > P*(G) = —0.70 > P%(q). 


By the triality theory, we know that ¥,; = (A+4J)~!f = {0.17, —2.02} is 
a global minimizer of P(x); and accordingly, P(X:) = P4(q@) = —1.1; and 
that x5 = {—0.23,0.05} and x3 = {1.44, 0.10} are local maximizers, whereas 
X4 = {—1.21,0.08} and x2 = {0.19, 1.96} are local minimizers. 

The graph of P@(c) for a four-dimensional problem is shown in Figure 8.10. 
It can be easily seen that P4(c) is strictly concave for ¢ > —a,. Within each 
interval —aj;_1 < ¢ < —a;, Vi = 1,2,...,7r, the dual function P4@(c) has at 
most one local minimum and one local maximum. These local extrema can 
be identified by the triality theory (Gao, 2003a). 

The nonconvex function W(x) in (8.121) could be in many other forms, 
for example, 


1 
W(x) = exp (51Bx? — a) F 


where B € R™*” is a given matrix and A > 0 is a constant. In this case, the 
primal problem (P) is a quadratic-exponential minimization problem 


1 1 
min { P(x) = exp (58x? a) + 5x Ax xP Yee Rr} : 


By letting € = A(x) = $|Bx|? — A, the canonical function V(£) = exp(E) is 
convex and its Legendre conjugate is V*(¢) = ¢(In¢ — 1). The canonical dual 
problem was formulated in Gao and Ruan (2007): 


(P*): max { PQs) = STIG —(slogs—s)—As = se vy}, 


where G(s) = 4+¢B’B and the dual feasible space is defined by 
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Vi ={s €R|>>0, G(s) is positive definite}. 


Detailed study of this case was given in Gao and Ruan (2007). 


8.7.2 Constrained Quadratic Minimization over a 
Sphere 


If the function W(x) in problem (8.121) is an indicator of a constraint set 
U; CR", that is, 

fo ifx Elk, 
Wee) = iss otherwise, 


then the general problem (8.121) becomes a constrained nonconvex quadratic 
optimization problem, denoted as 


(Pye min P(x) = 5 (Ae —(x,f) : xEuy}. (8.132) 


General constrained global optimization problems are discussed in the next 
section. Here, we consider the following quadratic minimization problem with 
a nonlinear constraint 


(Py): min P(x) = ax? Ax — f'x (8.133) 
s.t. |x| <1, 


where A = A” € R”*” is a symmetric matrix, f € R” is a given vector, 
and r > 0 is a constant. The feasible space Uz, = {x € R"| |x| < r} is 
a hypersphere in R”. This problem often arises as a subproblem in general 
optimization algorithms (cf. Powell, 2002). Often, in the model trust region 
methods, the objective function in nonlinear programming is approximated 
locally by a quadratic function. In such cases, the approximation is restricted 
to a small region around the current iterate. These methods therefore require 
the solution of quadratic programming problems over spheres. 

To solve this constrained nonconvex minimization by using a traditional 
Lagrange multiplier method, we have 


L(x,A) = 5x? Ax — f?x+ (|x| —7r). (8.134) 


For a given A > 0, the traditional dual function can be defined via the 
Fenchel—Moreau—Rockafellar duality theory: 


P*(\) = min{L(x,A) : x €R"}, (8.135) 
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which is a concave function of A. However, due to the nonconvexity of P(x), 
we have only the weak duality relationship 


min P(x) > max P*()). 
|x| <r A>0 

The duality gap 6 given by the slack in the above inequality is typically 

nonzero indicating that the dual solution does not solve the primal problem. 

On the other hand, the KKT condition leads to a coupled nonlinear algebraic 


system 
Ax + A\x|~'x = f, 
A>0, [xl <r, A(lx|—7r) =0. 

As indicated by Floudas and Visweswaran (1995), due to the presence 
of the nonlinear sphere constraint, the solution of (P,) is likely to be irra- 
tional, which implies that it is not possible to exactly compute the solution. 
Therefore, many polynomial time algorithms have been suggested to com- 
pute the approximate solution to this problem (see Ye, 1992). However, by 
the canonical dual transformation this problem has been solved completely 
in Gao (2004b). 

First, we need to reformulate the constraint |x| <r in the canonical form 


€ = A(x) = 5 lx? 
Let \ = $r?, then the canonical function V(A(x)) can be defined as 


0 iffE< A, 
+oco otherwise, 


ve@={ 


whose effective domain is VY, = {€ € R | € < A}. Letting U(x) = 


x! f — $x" Ax, the primal problem (P,) can be reformulated in the following 


canonical form, 
min{/7(x) = V(A(x)) — U(x) : x ER}. (8.136) 
By the Fenchel transformation, the conjugate of V(£) is 


if ¢ > 0, 


otherwise, ae 


Vi(c) = max{§s -V(@)}= ee 


whose effective domain is Vi = {¢ € R| ¢ > 0}. The dual feasible space V; 
in this problem is 


Vi={oER| ¢>0, det(A+cl) 40}. 


Thus, for a given ¢ € V*, the A-conjugate of U can be formulated as 
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UA(<) = sta { Six + 5x? Ax =x f i xe Rn} 
=-SiT(A+sDY, 

and the problem (P%), which is perfectly dual to (P,), is given by 
(Pays max | Pa) = — 5/7 (A SIF sev} (8.138) 
The criticality condition 6P4(¢) = 0 leads to a nonlinear algebraic equation 
SAT (A+ elf =A. (8.139) 
Similar to (8.128), this equation can also be solved easily by using Mathemat- 


ica. Each root ¢ is a critical point of P4(¢). The following theorem presents 
a complete set of solutions for this dual problem. 


Theorem 8.14. (Complete Solution to (P,) (Gao, 2004b)) Suppose 
that the symmetric matrix A has p < n distinct eigenvalues, and ig < p of 
them are negative such that 


ay < ag < +++ < aj, <0 < ajy4a < +++ < ap. 
Then for a given vector f € R”, the canonical dual problem (P¢) has at most 
2ia+1 critical points G,i =1,...,2ia+1, satisfying the following distribution 
law, 
a> -a1 > & > & > -ag >-++ > —a;, > Sa, > Saiy41 > 0. (8.140) 
For each ¢&,>0, i=1,...,2¢qa+1, the vector defined by 
x;,=(A+Gl) ‘f (8.141) 
is a KKT point of the problem (Pq) and 
P(x;)= P9(&), i=1,2,...,2¢¢4+1. (8.142) 


Moreover, if iq > 0, then the problem (Pq) has at most 2ig+1 critical points 
on the boundary of the sphere; that is, 


1 
gel? =, 4=1,...,2i¢41. (8.143) 
Because A = A’, there exists an orthogonal matrix R’ = R~! such that 


A = R™DR, where D = (aj6;;) is a diagonal matrix. For the given vector 
f €R", let g = Rf = (gi), and define 


io) 
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par > 


Fig. 8.11 Graph of 4(<). 


12 
= 5 I (ai +5). (8.144) 


Clearly, this real-valued function (cs) is strictly convex within each interval 
—ai41 <¢ < —aj;, as well as over the intervals —oo < ¢ < —ap and —a, < 
¢ < oo (see Figure 8.11). Thus, for a given parameter A > 0, the algebraic 
equation 


vs) = 5 Rai +>) 7 =X (8.145) 


has at most 2p solutions {¢} satisfying —aj41 < G@j41 < &; < —a,; for 
j=1,...,p—1, and G > —a1, Sp < —ap. Because A has only ig negative 
eigenvalues, the equality w(s) = A has at most 274 + 1 strictly positive roots 
{Gj} >0, 1=1,...,2iqg +1. By the complementarity condition G($|x;|? — 
A) = 0, we know that the primal problem (P,) has at most 2iq + 1 KKT 
points X; on the sphere $|x;|? = A. If a;,41 > 0, the equality u(s) = A may 
have at most 274 strictly positive roots. 

By using the triality theory, the extremality conditions of the critical points 
of the problem (P,) can be identified by the following result. 


Theorem 8.15. (Global and Local Extrema (Gao, 2004b)) Suppose 
that a, is the smallest eigenvalue of A. Then the dual problem (P%,) given 
in (8.138) has a unique solution ¢ over the domain ¢ > —a, > 0, and X, is 
a global minimizer of the problem (Pq); that is, 


P(X,) = min P(x) = max P4(<s) = P4(q). (8.146) 


xEl;, o>-a1 
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Fig. 8.12 Graph of P4(c). 


If in each interval (—aj41, ai), i= 1,...,%a, the dual algebraic equation 
(8.139) has two roots —ai41 < @i41 < Gi < —aQ;, then Ga; is a local min- 
imizer of P4(s), and G41 is a local maximizer of P4(s) over the interval 
(—Gi41, —@i). 


Proof. Because for any given ¢ > —ay, the matrix A+ cI is positive definite, 
that is, the total complementary function =(x,¢) is a saddle function, the 
saddle minmax theorem leads to (8.146). 

The remaining statements in Theorem 8.15 can be proved by the graph of 
P4(c) (see Figure 8.12). 


It is interesting to note that on the effective domain V*, the Fenchel- Young 
equality V(&) = (€;5) — V*(s) = (€ — A) holds true. Thus, on U, x V%*, the 
total complementary function 


E(x, 6) = (A(x);9) — V"(s) — U(x) 


=< (Sixt a) + x Ax — x! f (8.147) 


can be viewed as the traditional Lagrangian of the quadratic minimization 
problem with the reformulated (canonical) quadratic constraint $|x|? < A, 
which is also called extended Lagrangian (see Gao, 2000a). This example 
exhibits a connection between the nonlinear Lagrange multiplier method and 
the canonical dual transformation. Based on this observation, the traditional 
Lagrange multiplier method can be generalized to solve constrained global 
optimization problems. 
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8.8 General Constrained Global Optimization Problems 


In this section, we present an important application of the canonical duality 
theory to the following general constrained nonlinear programming problem 


min {—U(x) : xEUy}, (8.148) 


where U(x) is a Gateaux differentiable function, either a linear or canonical 
function, defined on an open convex set U, C R”, and the feasible space U/;, 
is a convex subset of U4, defined by 


U;, = {x €U, CR" | g(x) <0, i=1,...,p}, 


in which g;(x) : Ua — R are convex functions. We show the connection 
between the canonical dual transformation and nonlinear Lagrange multiplier 
methods and how to use the triality theory to identify global and local optima. 


8.8.1 Canonical Form and Total Complementary 
Function 


First, we need to put this problem in the framework of the canonical systems. 
Let the geometrical operator € = A(x) = {g:(x)} : Ua — Va C R? bea 
vector-valued function. The generalized canonical function 


0 if€<0 
oo otherwise 


ve={ 
is an indicator of the convex cone V, = {€ € R? | € < 0}. Thus, the canonical 
form of the constrained problem (8.148) is 
min{/7(x) = V(A(x)) — U(x) : xe Ug}. 


By the Fenchel transformation, the conjugate of V(€) is an indicator of the 
dual cone V* = {¢ € R?| ¢ => 0}; that is, 


0 ifs >0 
oOo otherwise. 


Vi(6) = max (és) -V(E) : €E RP} =| 

By the theory of convex analysis we have 
seo V(E) @ EET Vs) & (Es 5)=V(E)+V¥(5); (8.149) 
that is, (€,¢) is a generalized canonical pair on U, x V* (Gao, 2000c). Thus, 


the extended Lagrangian =(x,¢) = (A(x);¢) — V#(s) — U(x) in this problem 
has a very simple form: 
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Pp 

E(x,¢) = -U(x) + ¥¢ sigi(). (8.150) 
t=1 


We can see here that the canonical dual variable ¢ > 0 € R? is nothing but 
a Lagrange multiplier for the constraints A(x) = {g;(x)} < 0. Let 


T(x) = {i € {1,.--,p}] gi(x) = Of 


be the index set of the active constraints at x. By the theory of global opti- 
mization (cf. Horst et al., 2000) we know that if x is a local minimizer such 
that Vg;(X), 7 € Z(X), are linearly independent, then the KKT conditions 
hold: 


g(X) <0, G20, Sgi(X)=0, t=1,...,p, (8.151) 
p 

VU(X) = 50 GVgi(®). (8.152) 
i=1 


Any point (X,¢) that satisfies (8.151)—(8.152) is called a KKT stationary 
point of the problem (8.148). However, the KKT conditions (8.151)—(8.152) 
are only necessary for the minimization problem (8.148). They are sufficient 
for a constrained global minimum at x provided that, for example, the func- 
tions P(x) = —U(x) and g;(x), 7 = 1,...,p, are convex. In constrained 
global optimization problems, the primal problems may possess many local 
minimizers due to the nonconvexity of the objective function and constraints. 
Therefore, sufficient optimality conditions play a key role in developing global 
algorithms. Here we show that the triality theory can provide such sufficient 
conditions. 

The complementary function V#(¢) = 0, V¢ € V*, therefore in this con- 
strained optimization problem we have 


G.(x) = S(x,¢) = —U(x) +7 A(x). (8.153) 


For a fixed ¢ € V%, if the parametric function G, :U, — R is twice Gateaux 


differentiable, the space G can be written as 


G= {0<s) EU, x Vx | det (SS) x of 


Ox, OX; 


Clearly for any given (x,¢) € G, the dual feasible space V;, 
Pp 
Vie= ‘ EVs | AF(x)s = S° iVgi(x) = VU(x), Vx€ us} (8.154) 
i=l 


is nonempty and the A-conjugate transformation 
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U4(c) = sta{(A(x);<¢) — U(x) : Vx €Ug} 


can be well formulated on VY. Thus, the canonical dual problem can be 
proposed as the following, 


max{P4(¢) = —U4(s) : ¢ € Vf}. (8.155) 


In the following, we illustrate the foregoing results using some examples. 


8.8.2 Quadratic Minimization with Quadratic 
Constraints 


Let U(x) = x" f — $x” Ax and g(x) = $x"Cx — A be quadratic functions, 
where A and C’ are two symmetrical matrices in R"*”, f € R” is a given 
vector, and 4 € R is a given constant. Thus the primal problem is: 
. l T l n 
min < P(x) = 5x Ax—f°x : 5x Cx <i, xER">. (8.156) 
Because we have only one constraint g(x) = $x"Cx — X, the extended La- 
grangian is simply 


1 
5(x,¢) = 5X (A +c6C)x — f?x— od. (8.157) 
On the dual feasible space 
Vi={oER | ¢>0, det(A+cC) £0}, 


and the canonical dual problem (8.155) can be formulated as (see Gao, 2005a): 
d 1 T —1 * 
manc{ P (cs) aia (A+c¢C)f-As : oe vi}. (8.158) 


Because in this problem both A(x) = ($x"Cx — \) and U(x) = —}x7Ax + 
f?x are quadratic functions, 6?G, = (A +cC). The following result was 
obtained recently. 


Theorem 8.16. (Gao, 2005a) Suppose that the matrix C is positive defi- 
nite, and § € V* is a critical point of P4(s). If A+ ¢C is positive definite, 
the vector 

x= (A+¢C)'f 
is a global minimizer of the primal problem (8.156). However, if A+¢C is 
negative definite, the vector x = (A+¢C)~'f is a local minimizer of the 
primal problem (8.156). 


310 D.Y. Gao, H.D. Sherali 


<< 
SS 


Fig. 8.13 Graph of P(x) (left); contours of P(x) and boundary of U4, (right). 
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Fig. 8.14 Graphs of P4(<). 


In two-dimensional space, if we let a1; = 3, aj2 = do, = .5, dog = —2.0, 
and cy, = 1, cig = C21 = 0, cog = 0.5, the matrix A = {a;,} is indefinite, and 
C = {c,;} is positive definite. Setting f = {1,1.5} and \ = 2, the graph of 
the canonical function P(x) = $x” Ax — x’ f is a saddle surface (see Figure 
8.13), and the boundary of the feasible set U4, = {x € R? | $x7Cx < )} is 
an ellipse (see Figure 8.13). In this case, the dual problem has four critical 
points (see Figure 8.14): 


G1 = 5.22 > & = 3.32 > & = —2.58 > Gy = —3.97. 


Because ( € Vi and % € V%, the triality theory tells us that x; = 
{—0.22, 2.81} is a global minimizer, and x4 = {—1.90, —0.85} is a local mini- 
mizer. From the graph of P“(¢) we can see that x2 = {0.59, —2.70} is a local 
minimizer, and x3 = {2.0,0.15} is a local maximizer. We have 


P(x1) = —12.44 < P(x2) = —4.91 < P(x3) = 4.03 < P(x4) = 9.53. 
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8.8.3 Quadratic Minimization with Box Constraints 


The primal problem solved in this section is finding a global minimizer of a 
nonconvex quadratic function over a box constraint: 


(Pp) : min{ Pl) = 5x? Ax —f'x : @<x< ev} ; (8.159) 


where x € R”, and ¢', ¢“ are two given vectors in R”. Problems of the form 
(8.159) appear frequently in partial differential equations, discretized opti- 
mal control problems, linear least squares problems, and certain successive 
quadratic programming methods (cf. Floudas and Visweswaran, 1995). Par- 
ticularly, if ¢’ = 0 and ¢“ = 1, the problem (P;) is directly related to one 
of the fundamental problems of combinatorial optimization, namely, a con- 
tinuous relaxation to the problem of minimizing a quadratic function in 0-1 
variables. 

In order to solve this problem, we need to reformulate the constraints 
in canonical form. Without loss of generality, we assume that ¢@’ = —1 and 
é“ = 1 (if necessary, a simple linear transformation can be used to convert 
the problem to this form). 


min{ P(x) = x’ Ax—fx ss ge liah.. nh. (8.160) 


The constraint in this problem is a vector-valued quadratic function A(x) = 
{g;(x)} = {a? — 1} < 0 € R®”. Thus, the canonical dual variable ¢ = {c;} 
should also be a vector in R”. It has been shown recently that on the dual 
feasible space, 


Vi ={5ER"| ¢>0, det (A+ 2 Diag (¢)) 4 OF, 


where Diag (¢) € R"*” represents a diagonal matrix with ¢;, 7 =1,...,n as 
its diagonal entries; the canonical dual problem is given by (see Gao, 2007a,b) 


1 n 
nnd = ar a +2 Diag (s))"*f - Sos 2 ¢€ vit . (8.161) 
i=1 
This dual problem can be solved to obtain all the critical points ¢. It is shown 
in Gao (2007a,b) that if 
$e Vi ={s €R”| ¢>0, A+ 2 Diag (c) is positive definite}, 


then the vector x(¢) = (A +2 Diag (¢))~'f is a global minimizer of the 
primal problem. 
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8.8.4 Concave Minimization 


The primal problem in this case is given by 
(P.): min{P(x)=—U(x) : Bx<b, x€R"}, (8.162) 


where U(x) is a convex, or even nonsmooth function, and where B € R™*” 
and b € R™ are given. It is well known that this problem is NP-hard. Con- 
cave minimization problems constitute one of the most fundamental and in- 
tensely studied classes of problems in global minimization. A comprehensive 
review/survey of the mathematical properties, common applications, and so- 
lution methods is given by Benson (1995). By the use of the canonical dual 
transformation, a perfect dual problem has been formulated in Gao (2005a). 
In order to provide insights into the connection between the canonical dual 
transformation and the traditional Lagrange multiplier method, we demon- 
strate here how this perfect dual formulation can also be reproduced by the 
classical Lagrangian duality approach when executed in a particular fashion 
inspired by the canonical duality. 
First, let us introduce a parameter jz such that 


min{P(x) : Bx <b} << max{P(x) : Bx < b}. 


Then the parameterized canonical form of this problem can be formulated as 
(see Gao, 2005a) 


(Py): min{P(x) =—U(x): {U+p,Bx—b}<0¢€R'*™, xe R"}. 
(8.163) 
In this case, the constraint gi(x) = U(x) + uw is convex and {g;(x), 7 = 
2,...,m +1} = Bx -—b are linear. By introducing Lagrange multipliers 
(sc, y) € R'*™, and letting 


Vi ={(s, y)eRt™| 6 >0, y>0ER™}, 


the Lagrangian dual to the parameterized canonical problem (8.163) is given 
by 
E(x,s,y) = (s — 1)U(x) + ps + y7 (Bx — b). 


Thus, by the classical Lagrangian duality, the dual problem to (P,,) is 


(LD) : ea {us — y’b + min{(¢ — 1)U(x) + y? Bx}}. (8.164) 
Sy)EVE Pe 


Because U(x) is convex, the inner minimization problem in this dual form 
has a unique solution x if ¢ > 1. 
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Remark 8.1. Assume that 


(1) U(x) is a convex function such that x* = dU(x) is invertible 
for each x € R”, and the Legendre conjugate function U*(x*) = 
sta{x?x* — U(x) : 6U(x) = x*} is uniquely defined in R”. 

(2) An optimum solution x to the problem (P,,) is a KKT solution 
with Lagrange multipliers ¢ > 1, y > 0 € R™. 


Let 
Vi ={(s, y) Ee R*™| ¢>1, y2>0ER™}. 


Under Remark 8.1, thus, we can write (LD) in (8.164) as 


_ fy? Bx 
(LD): max {us -—y'b+(¢—1)min { + uh . (8.165) 
(y)EVE x s-1 
Observe that the effect of having introduced U(x) + 4 < 0 is to convexity the 
inner minimization problem in (8.165), which, by the assumption of Remark 
8.1, reduces (LD) to the following equivalent dual problem. 


(Pa): max {Pty =ns-y"b+0-ur (Z2X)}. (8.166) 


(.y)EVT 1 


This is the dual problem proposed by the canonical dual transformation in 
Gao (2005a). By the fact that the Legendre conjugate U*(x*) of the convex 
function U(x) is also convex, this canonical dual is a concave maximization 
problem over the dual feasible space Vj, which can be solved uniquely for a 
given parameter yw € R if Vi is nonempty. 

Under Remark 8.1, note that x solves the primal problem (P,,) because 
P(X) = p, and satisfies the KKT conditions 


(¢ — 1)6U(x) + BT¥ =0, (8.167) 
BE<b, U(X)+y=0, y¥"(BR-b)=0, 20, E>1. 
(8.168) 


Writing the (LD) in (8.164) as 


Pp? 
enoe 0 (S; y), 
where 
P#(s,y) = Hs — yb + min{(s — LU (x) + y" Bx}, 
we get 


PS(5,¥) =Su— b7¥ + (F — IU(K) +97 BR, (8.169) 


where X satisfies 5U(&) = B’ y/(1 —¢). By (8.167) and the assumed invert- 
ibility of the canonical dual relation x* = dU(x), we get X = X. Substituting 
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U* 


(a) Graph of U(z). (b) Graph of the Legendre conjugate U*(x*). 


Fig. 8.15 Nonsmooth function and its smooth Legendre conjugate. 


this into (8.169) and using (8.168) yields P@9(¢, ¥) = P(X); that is, there is 
zero duality gap. Furthermore, letting 


u, ={x €R"| Bx <b, —U(x) =p}, 
we have the following result. 


Theorem 8.17. (KKT Condition and Global Optimality) Under Re- 
mark 8.1, for a given parameter p, if (¢,y) € Ve is a KKT point of (Pe) 
such that 
Bry 
= Tr? 
then the vector X = 6U*(x*) is a KKT point of (P,), and P(x) = P%(é,y). 
Moreover, if ¢ > 1, then (€,¥) is a global maximizer of P4(s,y) on vi, x 
is a global minimizer of P(x) on the feasible space U,,, and 


x* 


in P(x) = ee 
mn (x) Pare (s,y) 


This example shows again that when a nonconvex constrained optimization 
problem can be written in a canonical form, the classical Lagrange multiplier 
method can be used to formulate a perfect dual problem. A detailed study 
on the canonical duality theory for solving general constrained nonconvex 
minimization problems and its connections with Lagrangian duality appears 
in Gao, Ruan, and Sherali (2008). 

One advantage of the canonical duality approach is that if the convex U(x) 
is nonsmooth on U/,, its Fenchel-Legendre conjugate U* is a smooth function 
on U* (see Figure 8.15). Such an idea has also been used in the study of 
geometrical dual analysis for solving nonsmooth “shape-preserving” design 
problems (see Cheng, Fang, and Lavery, 2005, Lavery, 2004, Zhao, Fang, and 
Lavery, 2006). 
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8.9 Sequential Canonical Dual Transformation and 
Solutions to Polynomial Minimization Problems 


The canonical dual transformation method can be generalized in different 
ways to solve the global optimization problem: 


min{ P(x) = W(x) —U(x) : x€uUag} (8.170) 


with different types of nonconvex functions W(x) = V(A(x)) and geometrical 
operators A. If the geometrical operator A : U — Y is a general nonlinear, 
nonconvex mapping, we can continue to use the canonical dual transforma- 
tion such that the general nonconvex function W(x) can be written in the 
canonical form (see Gao, 2000a): 


W(x) = V(A(x)) = Vnlbn(En—1--- (€1(u)) ---)))s (8.171) 


where &(€—1) is either a convex or a concave function of &_1, and we write 


Ve (Ex) = €e-+1(Ex), k=1,...,n—1. 


Thus, the geometrical operator A: U — Y in this problem is a sequential 
composition of nonlinear mappings A“) : Yz_1 > Va, k=1,-::,n, Vo =U, 
and VY, = V; that is, 


En (x) = A(x) = jae 0 AMD 6...90 AM (x). 


Because each V;,(&,) is a canonical function of &, the canonical duality re- 
lation c, = 6VE(E%) : Ve — Vg is one-to-one. It turns out that the Legendre 
conjugate 


Vi (Sk) = (E53 Sk) — Ve (Ex) 


can be uniquely defined. Letting ¢ = {¢;} € R”, the sequential canonical 
Lagrangian associated with the general nonconvex problem (8.170) can be 
written as (see Gao, 2000a) 


E(x,s) = (A™ (x); cn!) — Vis(s) — U(x), (8.172) 
where ¢p! := GSp—1 °° + S96, and 


Sat 


Vin (S) = Vin (Sn) + SnVira(Sn—1) + 00* + 2 
1 


Vi‘ (61). (8.173) 


Thus, the canonical dual problem can be formulated as: 


Q) 


max{P4(¢) = U4’ (s) —V(s) : se Vf}. (8.174) 
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For certain given canonical functions V, and U, and the geometrical operator 
A), the A-conjugate transformation 


u4™ (6) = sta{(A (x); on!) — U(x) 2 6AM (x)oy! = 6U(x)} 


can be well defined on certain dual feasible spaces V;, and the canonical 
dual variables ¢, linearly depend on ¢,. This canonical dual problem can be 
solved very easily. Two sequential canonical dual transformation methods 
have been proposed in Chapter 4 of Gao (2000a). Applications to general 
nonconvex differential equations and chaotic dynamical systems have been 
given in Gao (1998a, 2000b). 
As an application, let us consider the following polynomial minimization 
problem 
min{ P(x) = W(x)—x?f: x€R"}, (8.175) 


where x = (%1,2%2,...,%n)? € R” is a real vector, f € R” is a given vector, 
and W(x) is a so-called canonical polynomial of degree d = 2?*1 (see Gao, 
2000a), defined by 


2 2 
1 1 1 1 
W (x) = 9% 7 Or-1 tae (Fe (Sx? - as) i) — Ap-1 — Xp ; 


(8.176) 
where q;, A; are given parameters. It is known that the general polynomial 
minimization problem is NP-hard even when d = 4 (see Nesterov, 2000). 
Many numerical methods and algorithms have been suggested recently for 
finding tight lower bounds of general polynomial optimization problems (see 
Lasserre, 2001, Parrilo and Sturmfels, 2003). 

For the current canonical polynomial minimization problem, the dual prob- 
lem has been formulated in Gao (2006); that is, 


(Pt): na PA) __lfP ->23 Vz wo} (8.177) 


Dg.! 


where 


SL =S; on = an ( am). KS) 2.505 Ds (8.178) 


oe 
20K-1 


In this case, V*;,(<,) is a quadratic function of ¢, defined by 


1 
= —— oe + Agee. 


V*n(ck) Da, 
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The dual problem is a nonlinear program having only one variable ¢ € R, 
which is much easier to solve than the primal problem. Clearly, for any ¢ 4 0 
and a A 2apAp41, the dual function P? is well defined and the criticality 
condition 6P4(¢) = 0 leads to a dual algebraic equation 


2(<p!)?(azy 6 + Ar) = | fl’. (8.179) 


Theorem 8.18. (Complete Solution Set to Canonical Polynomial 
(Gao, 2006)) For any parameters az, and Ax,k = 1,...,p, and input f, 
the dual algebraic equation (8.179) has at most s = 2?+! — 1 real solutions: 
¢, i=1,...,s8. For each dual solution ¢ € R, the vector & defined by 


xq =G) 7 (8.180) 
is a critical point of the primal problem (P) and 
P(x) = P4(é). 


Conversely, every critical point X of the polynomial P(x) can be written in 
the form (8.180) for some dual solution ¢ € R. 


In the case that p = 1, the nonconvex function W(x) = $01($|x|? — A1)? 
is a double-well function. The global and local extrema can be identified by 
the triality theory given in Theorem 8.6. For the general case of p > 1, the 


sufficient condition for global minimizer was obtained recently in Gao (2006). 


Theorem 8.19. (Sufficient Condition for Global Minimizer) Suppose 
that for any arbitrarily given positive parameters az, A, > 0, Vk € {1,...,p}, 
¢ is a solution of the dual algebraic equation (8.179). If 


then € is a global maximizer of P4 on the open domain (s,,+00), the vector 
x = (G!)7'f is a global minimizer of the polynomial minimization problem 
(8.175), and 
P(X) = min P(x) = max P4(¢) = P#(6). (8.181) 
xeER"” Spo 
In the case of p = 2, the nonconvex function W(x) is a canonical polyno- 
mial of degree eight. The dual function P4(¢) has the form of 


2 1 1 
ke wep e. — 24) —— 27 + 8.182 
(s) es mo a2 + c2(a7-s +15) }, ( ) 


where ¢2 = a9¢7/(2a,) — Az2Q2. In this case, the dual algebraic equation 
(8.179) 
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(a) Ay = 0: Three solutions ¢3 = 0.22 < cg = 1.387 <q = 1.45 
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(c) Ai = 2: Seven solutions {—2.0, —1.45, —1.35, —0.072, 0.07, 1.39, 1.44} 
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Fig. 8.16 Graphs of the algebraic curve $2(c) (left) and dual function P4(¢) (right). 


2 
a 1 
ac” (Se - a) (= + ms) =|f/? (8.183) 
201 Qi 
has at most seven real roots G,7=1,...,7. Let 


and f = {0.1,—0.1}, ay =1, ag =1, and Az = 1. Then, for different values 
of 1, the graphs of ¢2(¢) and P4(¢) are shown in Figure 8.16. The graphs 
of P(x) are shown in Figure 8.17 (for Ay = 0 and A, = 1) and Figure 8.18 
(for \; = 2). Because ¢4 = V2a,A2 = V2, we can see that the dual function 
P4%(c) is strictly concave for ¢ > s, = V2. The dual algebraic equation 


8 Canonical Duality Theory 319 


(a) Ar =0. (b) Ay =1. 


Fig. 8.17 Graphs of P(x). 


S 


Fig. 8.18 Graph of P(x) with \; = 2. 


(8.183) has a total of seven real solutions when Ay = 2, and the largest 
c1 = 2.10 > cy = 2 gives the global minimizer x; = f/o. = {2.29, —0.92}, 
and P(x,) = —1.32 = P4(c,). The smallest ¢7 = —4.0 gives a local maximizer 
x7 = {—0.04, 0.02} and P(x7) = 4.51 = P4(c7) (see Figure 8.18). 

Detailed studies on solving general polynomial minimization problems are 
given in Gao (2000a, 2006), Lasserre (2001), and Sherali and Tuncbilek (1992, 
1997). 
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8.10 Concluding Remarks 


We have presented a detailed review on the canonical dual transformation and 
its associated triality theory, with specific applications to nonconvex analysis 
and global optimization problems. Duality plays a key role in modern math- 
ematics and science. The inner beauty of duality theory owes much to the 
fact that many different natural phenomena can be cast in the unified math- 
ematical framework of Figure 8.1. According to the traditional philosophical 
principle of ying—yang duality, The Complementarity of One Ying and One 
Yang is the Dao (see Gao, 1996b, Lao Zhi, 400 BC); that is, the constitutive 
relations in any physical system should be one-to-one. Niels Bohr realized 
its value in quantum mechanics. His complementarity theory and philosophy 
laid a foundation on which the field of modern physics was developed (Pais, 
1991). In nonconvex analysis and optimization, this one-to-one canonical du- 
ality relation serves as the foundation for the canonical dual transformation 
method. For any given nonconvex problem, as long as the geometrical op- 
erator A is chosen properly and the tricanonical forms can be characterized 
correctly, the canonical dual transformation can be used to establish elegant 
theoretical results and to develop efficient algorithms for robust computa- 
tions. The extended Lagrangian duality and triality theories show promise of 
having significance in many diverse fields. 

As indicated in Gao (2000a), duality in natural systems is a very broad 
and rich field. To theoretical scientists and philosophical thinkers as well 
as great artists, duality has always played a central role in their respective 
fields. It is really “a splendid feeling to realize the unity of a complex of 
phenomena that by physical perception appear to be completely separated” 
(Albert Einstein). It is pleasing to see that more and more knowledgeable 
researchers and scientists are working in this wonderland and exploring the 
intrinsic beauty of nature, often revealed via duality theory. 
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Chapter 9 


Quantum Computation and Quantum 
Operations 


Stan Gudder 


Summary. Quantum operations play an important role in quantum measure- 
ment, quantum computation, and quantum information theories. We classify 
quantum operations according to certain special properties such as unital, 
tracial, subtracial, self-adjoint, and idempotent. We also consider a type of 
quantum operation called a Liiders map. Examples of quantum operations 
that describe noisy quantum channels are discussed. Results concerning itera- 
tions and fixed points of quantum operations are presented. The relationship 
between quantum operations and completely positive maps is discussed and 
the sequential product of quantum effects is considered. 


Key words: Quantum computation, quantum operation, quantum channel, 
quantum information theory 


9.1 Introduction and Basic Definitions 


The main arena for studies in quantum computation and quantum informa- 
tion is a finite-dimensional complex Hilbert space which we denote by H. We 
denote the set of bounded linear operators on H by B(H) and we use the 
notation 
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The elements of €(H) are called effects and the elements of D(H) are called 
states (or density operators). It is clear that D(H) C €(H) C B(H)*. Et 
fects correspond to quantum yes—no measurements that may be unsharp. If a 
quantum system is in the state p, then the probability that the effect A occurs 
(has answer yes) is given by P,(A) = tr(pA). As we show, quantum measure- 
ments with more than two possible values (not just yes—no) can be described 
by quantum operations. It is easy to check that D(H) forms a convex subset 
of B(H) and the extreme points of D(H) are called pure states. The pure 
states have the form Py where Py denotes a one-dimensional projection onto 
a unit vector w € H. If p = Py is a pure state, then 


P,(A) = tr(Py.A) = (Ad, W). 


Let A; € B(H), i = 1,...,n, and let A = {A;, AT: i=1,...,n}. We 
call the map ¢4: B(H) — B(H) given by ¢4(B) = >> A; BA? a quantum 
operation and we call the operators A;, i = 1,...,n, the operation elements 
of ¢4. Notice that 64: B(H)t — B(H)*; that is, 6,4 preserves positivity. 
Also, #4 is linear and A < B implies that ¢4(A) < ¢4(B). We say that 
a is unital, tracial, or subtracial, respectively, in the case )> A;A¥ = J, 
>> AFA; = I, or 5 AFA; < I, respectively. Notice that ¢, is a unital if and 
only if d4(1) = I, by is tracial if and only if tr(¢,4(B)) = tr(B) for every 
B € B(H), and ¢, is subtracial if and only if tr (¢4(B)) < tr(B) for every 
B € B(H)*. We say that ¢,4 is self-adjoint if A; = A¥, i = 1,...,n. An 
important type of self-adjoint quantum operation in quantum measurement 
theory [4, 7, 9] is a Ltiders map of the form L(B) = 30 Aj/*BA}/? where A; € 
E(H) with $A; = J, i =1,...,n. In this case, L is unital and tracial and 
{A;: i=1,...,n} is called a finite POV (positive operator-valued) measure. 
We interpret the POV measure {A;: 7 = 1,...,n} as a quantum measurement 
with n possible values (which can be taken to be 1,...,n). Restricting L to 
E(H) we have L: E(H) — E(H) and L(B) is interpreted as the effect resulting 
from first making the measurement described by {A;: 7 = 1,...,n} and then 
measuring B. If we restrict L to D(H) then L: D(H) — D(H) is called the 
square root dynamics [2]. 

Quantum operations have various interpretations in quantum measure- 
ment, computation, and information theories [1, 4, 7, 8, 9, 10]. If dy is tra- 
cial, then ¢4: D(H) — D(H) can be thought of as a quantum measurement 
with possible outcomes 1,2,...,n. If the measurement is performed on a 
quantum system in the state p € D(H), then the probability of obtaining 
outcome 7 is tr(A;pA;) and the postmeasurement state given that 4 occurs 
is A,;pA* /tr(A;p Aj). Moreover, the resulting state after the measurement is 
executed but no observation is made is given by ¢,4(p). Quantum operations 
can also be interpreted as an interaction of a quantum system with an en- 
vironment followed by a unitary evolution, a noisy quantum channel, or a 
quantum error correction map [10]. Depending on the application, at least 
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one of our previous properties is assumed to hold. For illustrative purposes, 
we mainly consider the noisy quantum channel interpretation. 

Notice that if é4 and ¢g are quantum operations on B(H) with A = 
{A;, Af: i =1,...,n}, B= { By, BF: ae ee ,m}, then their composition 
4° dp is a Guieeibtinn operation on B(H) with operation elements A;B,, 
i=1,...,n,j=1,...,m. If A=B we write 6% = ¢40¢4,..., 


4 = ¢A°-+++ba (n factors). 


A quantum operation ¢ 4 is idempotent if $2, = ¢.4. We now give some simple 
basic results. 


Lemma 9.1.1. [f ¢,4 and ¢g are both unital, tracial, or subtracial, respec- 
tively, then 64° dg is unital, tracial, or subtracial, respectively. 


Proof. If ¢4 and ¢g are both unital, then >) A;Aj = >) By; BF = I. Hence, 
S° A: B;(AiB;) ae S 7 Ai S| By BFA; 
19 z j 

= “>A AR aT: 


Therefore, ¢,4 © ¢g is unital. In a similar way, if ¢4 and ¢g are both tracial 
then ¢,4 © dg is tracial. Now suppose that ¢,4 and ¢g are both subtracial. 
Then there exists a C € E(H) such that > A¥ A; + C = I. Hence, 


> ((AiB))* AB; = >> BP ATA,B,; = >” BF )- ATA,B; 
a,j a j 


iJ 
=) BiB; —>_ BiCB; < >_ BiB; <1. 
Therefore, ¢.4 © @g is subtracial. 


Lemma 9.1.2. If @, is subtracial and its operation elements are self-adjoint 
projection operators, then d, is idempotent. 


Proof. We have that ¢4(B) = 3> A;BA; where A; = A* = A? and > A; < J, 
i=1,...,n. For i,j € {1,...,n}, 747, we have 


A, +A; < >> As a. 
It follows that A;A; = A;A; = 0 for i 4 7. Hence, 


$0 ba(B) => — A; AiBAiA; = 50 ABA; = b4(B) 


tJ 


so that o,4 is idempotent. 
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9.2 Completely Positive Maps 


In Section 9.1 we defined a quantum operation as a map ¢: B(H) — B(H) 
of the form 


o(B) = 5— ABA? (9.1) 


and in Section 9.3 we give some simple practical examples of quantum oper- 
ations. But why do quantum operations have the operator-sum form (9.1)? 
The present section tries to answer this question in terms of completely pos- 
itive maps. 

We can consider M;, = B(C*) as the set of all k x k complex matrices, k = 
1,2,.... The set of operators in the tensor product B(H) @M;, = B(H@C*) 
can be considered to be the set of & x & matrices with entries in B(H). For 
example if A, B,C, D € B(H), then the matrix 


w=lep 


is an element of B(H) @ Mg. Of course, M € B(H ® C?) in the sense that 
“|  |Av+ By 
a H 7 | 


for all x,y € H. For a linear map ¢: B(H) — B(H) we define the linear maps 
oy: BCH) @ My — B(H) ® My given by 


bx(M) = [6(Mi;)], 


where M = [M,;] € B(H)®Mx, i,j =1,...,k. If 6, sends positive operators 
into positive operators for k = 1,2,..., then ¢ is called completely positive. 
It is easy to check that ¢: B(H) — B(H) is completely positive if and only 
if 6 @ Ix: B(H) @ My — B(H) ® My, preserves positivity for k = 1,2,..., 
where J;, is the identity map on M,. 

We have seen that a quantum operation ¢: B(H) — B(H) describes vari- 
ous ways that states are transformed into other states for a quantum system. 
Because states are positive operators, ¢ must preserve positivity. Now sup- 
pose our quantum system interacts (or couples) with an environment such as 
a noisy quantum channel. If this environment is described by the Hilbert 
space C*, then the combined system is described by the tensor product 
H ® C*. The natural extension of ¢ to the combined system is given by 
$@ Ik: BH@C*) — BCHH@C*). The map ¢ ® J; just acts on B(H) like ¢ 
and leaves the environment unaltered. We would expect ¢@ J; to map states 
into states so 6® I; should also preserve positivity, k = 1,2,.... We conclude 
that quantum operations should be completely positive maps. 

If x,y € H we define the linear operator |x) (y| € B(H) by |x) (y|u = (y, v)x 
for every v € H. If {x1,...,2n} is an orthonormal basis for H, then any 
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A € B(H) has the form 
A=) aylxs){a4l, (9.2) 
where aj; € C, i,j =1,...,n. Now let {y1,...,yx} be an orthonormal basis 
for C*. Then an orthonormal basis for H @ C* is given by 
{ti @ yj: t=1,...,n 7 =1,...,k}. 


For an operator M € B(H @ C*) as in (9.2) we have 


M = S> ansi,j|tr @ yi) (ts ® ysl 


1,8 50,J 


S> ar,s,i,5|tr)(es| ® lye) (v5 | 


P8,4,J 


x (= Ar,s,i,j|Er) (Xs ) ® |ya) (yy 


Uj r,s 


= S7 Ais ® lds) (9.3) 


J 


where 


Aij = > Greiglar)(wa| € BCH). 


If ¢: B(H) > B(H) isa linear map and M € B(H@C*) has the representation 
(9.3), then ¢@ Jy: B(H @ C*) — B(H ® C*) satisfies 


(¢@ I,)(M) = a b(Aiz) ® Lys) (yyl- (9.4) 


The following structure theorem is due to Choi [6]. 


Theorem 9.2.1. A linear map ¢: B(H) — B(H) is completely positive if 
and only if there exist a finite number of operators A; € B(H) such that (9.1) 
holds for every B € B(H). 


Proof. Suppose ¢ has the representation (9.1). Applying (9.4) we have 


(¢@I)(M) = > o(Ais) ® |yi) (Ys | 
= Ar Ais; @ lye) (ysl 


Now any z € H@C* can be represented in the form 


Z= y Us ® Us, 
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where us € H, vs € C*. Writing z, = >, A*us @ vs it is easy to check that 


((¢@ Ik)(M)z, 2) = 5° (Mz,, z-) > 0 
because M is positive. 
Conversely, let ¢: B(H) — B(H) be a completely positive map. Let 
{@1,..-,%,} and {y1,.--,Yn} be two orthonormal bases for H. Now ¢ @ I, 
is positivity preserving. The operator M € B(H @ H) defined by 


M = 0 |ar)(xs| ® lr) (ysl 


r,s 


> |Zr @ Yr) (Ls @ Ys| 


r,s 


Teen) Caen 


is positive because M is a multiple of a one-dimensional projection. Hence, 


($ In)(M) = S~ 6 (lar)(al) @ lyr )(ys (9.5) 
is a positive operator. By the spectral theorem there exists an orthonormal 
basis {v1,...,Um} of H @ H where m = n? and positive numbers \1,..., Am 
such that 


(9.6) 


($8 In)(M) = > Aslvi) (vil = | VA vi) (VA vi 


If v = )ou,j;2; ® y; is a vector in H @ H we associate with v an operator 
A, € B(H) by 


Ay = Yl) a5), (9.7) 
a,j 
Then a straightforward computation gives 


|v) (v| = > Aula) (eal A* ® lyr) (Ys|- (9.8) 


r,s 


Associating with each A; v; in (9.6) the operator A; in (9.7) and using (9.8) 
we have 


(¢@ In)(M) = S> ilar) (ts| A? ® lyr) (ysl- (9.9) 


17,8 


Applying (9.5) and (9.9) gives 
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@ (\@) (as|) =D |v) (us| AF. 
Because the operators |,)(2,| span the whole space B(H), we conclude that 
(9.1) holds for every B € B(H). 


We now show that the operator-sum representation (9.1) is not unique. In 
other words, the operation elements for a quantum operation are not unique. 
Let ¢ and w be quantum operations acting on B(C?) with operation-sum 
representations 


¢(B) = E\BE* + E)BE% 
W)(B) = FL BF* + FoBF%, 


where 
1 10 1 1 0 
r=s|45| r= [59 


10 00 
Fi=|55| r= [oq]: 


Although ¢ and ~ appear to be quite different, they are actually the same 
quantum operation. To see this, note that Fy, = oa (Ei + Ez) and Fh = 
+—(E, — Ez). Thus 
real 1 2). ? 


w(B) = (Ey + Eo) B(Ey + Ep) (Fy = Eo) B(E, = Ep) 


= FE, BE, + E,BE> = $(B). 


Notice that in the previous example we could write F; = > u;;E; where 
[wij] is the unitary matrix 


eae 
2 (Aly 
In this sense, the operation elements of 7 are related to the operation elements 


of @ by a unitary matrix. The next theorem, whose proof may be found in 
[10], shows that this holds in general. 


Theorem 9.2.2. Suppose {F,...,E,} and {F,,..., Fm} are operation el- 
ements giving rise to subtracial quantum operations @ and w, respectively. 
By appending zero operators to the shorter list of operation elements we may 
assume thatm =n. Then 6 =w if and only if there exist complex numbers 
ui such that F; = ar uijE; where [uij| is an m x m unitary matric. 


This theorem is important in the development of quantum error-correcting 
codes [10]. Suppose we have two representations 
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¢(B) = 5) E,BE = 5° Fj BF} 
for the quantum operation ¢. 


Lemma 9.2.3. The quantum operation ¢ is unital, tracial, or subtracial, re- 
spectively, with respect to the operation elements {F,...,E,} if and only if 
@ is unital, tracial, or subtracial, respectively, with respect to the operation 
elements {F1,..., Fim}. 


Proof. If ¢ is unital with respect to {F£1,..., En}, then 


SOFFIT =0(D =>) BEEP =I 


so @ is unital with respect to {Fi,...,Fin}. If ¢ is tracial with respect to 
{Fi,..., Fim}, then for any B € B(H) we have 


tr(B) = tr(¢(B)) = tr 63 F;BF;) =tr ~ FF;B) 


It follows that > Fs F; =I so ¢ is tracial with respect to {fi,..., Fim}. The 
subtracial proof is similar. 


This last lemma does not apply to self-adjoint quantum operations. For 
example, if 6(B) = )/ A; BA} where the A; are self-adjoint we can also write 
¢(B) = O(iA;)B(iA;)* where iA; are not self-adjoint. 

We now give an example which shows that a positivity preserving map 
need not be completely positive. Define ¢: B(C?) — B(C?) by ¢(A) = AT 
where A” is the transpose of A. Now a matrix 


A= E 4 € B(C?) 


is positive if and only if a > 0, d > 0, and ad — bc > 0. Hence, if A > 0 
then A? > 0 so ¢ is positivity preserving. To show that ¢ is not completely 
positive consider ¢ ® Iz on B(C? @ C?). Let e; = (1,0), e2 = (0,1) be the 
standard basis for C? and define the positive operator A € B(C? @ C?) by 


A= lei ® e1 + €2 @ €2) (€1 © €2 + €2 @ €2| 
= |e1  €1) (€1 ® e€1| + Jer ® €1) (€2 ® €2| + lez ® €2) (€1 @ €1| 


+ |e2 ® €2)(€2 @ €9| 


= |e1)(e1| ® |e1) (e1| + ler) (e2] ® |e1) (eal 
+ |e2)(e1| @ |e2)(e1| + ez) (e2| @ lez) (ea] - 


We then have 


9 Quantum Computation and Quantum Operations 335 
(¢ @ I2)(A) = lex) {e1| ® er) (e1| + |e2) (er @ ex) {eal 
+ |e1)(e2| ® |e2)(e1| + |e2) (ea| ® Je2) (ea 
= |e1 ® €1) (e1 ® e1| + |e1 @ €1) (e1 ® €2| 
+ |e1 ® €2)(e2 @ e1| + lez ® €2) (€2 @ €9| 


1000 
0010 
0100 
0001 


But letting v = (0,1, —1,0) € C? @ C? we have 
((¢ ® I,-)(v), v) = ((0, —1, 1,0), (0,1, —1,0)) = —2. 


Hence, ¢ ® Ig is not positivity preserving so ¢ is not completely positive. 


9.3 Noisy Quantum Channels 


This section discusses the quantum operation descriptions of some simple 
noisy quantum channels [10]. A two-dimensional quantum system is called a 
qubit. This is the most basic quantum system studied in quantum computa- 
tion and quantum information theory. A qubit has a two-dimensional state 
space C? with (computational) basis elements |0) = (1,0) and |1) = (0,1). 
The bit flip channel flips the state of a qubit from |0) to |1) (and vice versa) 
with probability 1 — p, 0 < p< 1. Letting X be the Pauli matrix 


01 
*=[io] 
we can represent the bit flip channel by the quantum operation 


bop (p) = pp + (1 — p)X px. 


Notice that @»f has operation elements {p!/?I,(1—>p)!/?X} and that $y, is 
self-adjoint and tracial. It is also unital because for any self-adjoint quantum 
operation tracial and unital are equivalent. Of course, dpy gives a bit flip 
because X|0) = |1) and X|1) = |0). Hence, 


os (0) (Ol) = plO)(0] + (1 — p)lA) (A 


so the pure state |0)(0| is left undisturbed with probability p and is flipped 
with probability 1 — p. Similarly, 
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vj (|1)(1]) = pid) 1] + (1 — p)]0) (0} . 
The phase flip channel is represented by the quantum operation 
pt (P) = pp + (1 — p)ZpZ, 


where 0 < p< 1 and Z is the Pauli matrix 


1 O 
ge | : i | 
The operation elements for pf are {p'/?I, (1 — p)'/?Z} so again ¢pp is self- 
adjoint and tracial. Because Z|0) = |0) and Z|1) = —|1) we see that dp 


changes the relative phase of the qubit states with probability 1 — p. 
The bit-phase flip channel is represented by the quantum operation 


vps (p) = pp + (1 — p)Y py, 
where 0 < p< 1 and Y is the Pauli matrix 
0-1 
ra [Oe] 
This gives a combination of a bit flip and a phase flip because Y = 1X Z. The 
operation elements for dp¢ are {p!/?I,(1—p)'/?Y} so ppp is self-adjoint 


and tracial. We obtain an interesting quantum operation by forming the 
composition ¢pf ° dp. Because X Z = —1Y we have 


dof © Opp (p) = P*p + D(1 — p)ZpZ + p(1 — p)XpX + (1—p)’Ypy. 


The operation elements become 


{ol, /p(1 —p) Z,4/p(1 — p) X, (1- py} 


so again, dof ° dpe is self-adjoint and tracial. It is also easy to check that 


dvs © bog = Pog ° Ppf- 
Another important type of quantum noise is the depolarizing channel given 
by the quantum operation 


I 
Pdp(pP) = = + (1—p)p, 


where 0 < p < 1. This channel depolarizes a qubit state with probability 
p. That is, the state p is replaced by the completely mixed state [/2 with 
probability p. By applying the identity 


IT p+XpX+YpY + ZpZ 
_— 4 
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that holds for every p € D(C?) we can write 


3 p 
dap(p) = (1 = +») pt a(XpX +YpY + ZpZ). 


Thus, the operation elements for ¢gp become 


{ V¥= 30/41, VPX/2, VPY/2, /PZ/2} 


As before, @ap is self-adjoint and tracial. 

There are practical quantum operations that are not self-adjoint or unital. 
For example, consider the amplitude damping channel given by the quantum 
operation 


Pad(p) = AipAj + AspAs, 


melatsh #-[%) 


where 


vI=7 
and 0 < y < 1. It is easy to check that ¢gq is tracial but not self-adjoint 
nor unital. Although the quantum channels (quantum operations) that we 
have considered appear to be quite specialized, general quantum channels 
and quantum operations can be constructed in terms of these simple ones 
and this is important for the theory of quantum error correction. 


9.4 Iterations 


It is sometimes important to consider iterations of quantum operations. For 
example, a measurement may be repeated many times for greater accuracy 
or quantum data may enter a noisy channel several times. For a quantum 
operation ¢,4, does the sequence of iterations ¢%(p), n = 1,2,..., converge 
for every state p € D(H)? (Because H is finite-dimensional, all the usual 
forms of convergence such as norm convergence or matrix entry convergence 
coincide so we do not need to specify a particular type of convergence.) In 
general, the answer is no. For example, ¢(p) = XpX is a self-adjoint, tracial, 
and unital quantum operation. Because X? = I we have ¢?"(p) = p,n = 
1,2,..., but ¢?"+1(p) = XpX,n=1,2,.... Unless pX = Xp, the sequence 
of iterates does not converge. 

A state po is a fixed point of a quantum operation ¢,4 if d4(po) = po. It 
is frequently useful to know the fixed points of a quantum operation because 
these are the states that are not disturbed by a quantum measurement or a 
noisy quantum channel. 


Lemma 9.4.1. A state po is a fixed point of d, if and only if there exists a 
state p such that lim 64(p) = po. 
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Proof. If lim ¢%(p) = po, by the continuity of ¢4 we have that 
po = lim $4**(p) = lim ¢.4 0 $'4(p) = ba (lim 6%4(p)) = 4.4(p0)- 


Hence, po is a fixed point of ¢,4. Conversely, if pp is a fixed point of 64 we 
have that 


$'4(p0) = % * (pa(po)) = ¢% -(p0) = ++: = b.a(p0) = po- 


Hence, lim ¢%(0) = po. 
The next result shows that the iterates of some of the quantum operations 
considered in Section 9.3 always converge. 
Theorem 9.4.2. For any p € D(C?) we have that 
(a) lim @f-(p) = 30 + xX pX 
(b) lim of p(p) = 30 + 32pZ 
(c) lim $f, ¢(9) = 30 + ZY pY 
(d) lim $%,(p) = $. 
Proof. (a) Any p € D(C?) has the Bloch form 


= 1+r3 T1 — 172 
P= 5 T1 +1972 1—Tr3 : 
where 7; > 0, i = 1,2,3, and r? + r3 +r} < 1. Because 


1 
Xpx =3| 


2 


l—rg 71 +ire 
Ty —irg 14+ 73 


we have that 


_1f 1+(2p—-1)rs_ m1 —i(2p —1)re 
bos ( )= 9 ie + ip — Drs 1- Op — 1)r3 | , 


We can now prove by induction that 


én-(o) =? | 1+ (2p—1)"r3. 11 —i(2p — a] . 


2 [ri t+i(2p—1)"r2 1-(2p—1)"r3 


Because 0 < p < 1, we have —1 < 2p—1 < 1 so that lim(2p— 1)” = 0. Hence, 


Se ye Ltr 1 1 
lim $55 (p) = 3 E Hl = 5Pt 5X px. 


The proofs of (b) and (c) are similar. To prove (d), a simple induction argu- 
ment shows that for every p € D(C?) 


n Lg n 
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where g = 1 — p. Because 0 < q,p < 1, we have that 


I 


lim $4 5(p) = 3 


We see from Theorem 9.4.2(a) that lim ¢j', = oe where 


1 1 
oy) (p) = 5P + are 


and similar results hold for ¢p¢ and @ppf. Notice that ope is an idempotent 
quantum operation. Indeed, 


1 1 1 1 
Ooh 0 be (p) = sot -XpX + -XpxX a eee 2X? 

4 4 4 ee 

1 


1 
= 5p + 5XeX = oh)" (p)- 


The next result shows that this always happens. 


Theorem 9.4.3. [f there exists a quantum operation @ such that lim $%\(p) = 
o(p) for every p € D(H), then @ is idempotent. Moreover, the set of fixed 
points of da coincides with the range ran(@). 


Proof. By the continuity of 6% we have 


6% (6(0)) = 6% ( Jim 6%(p)) = lim. o%*"(p) = 4(0). 


= 


Hence, 
$0 O(p) = lim ¢% ((p)) = 9() 


and we conclude that ¢@ is idempotent. The last statement follows from 
Lemma 9.4.1. 


9.5 Fixed Points 


Let ¢4 be a quantum operation with A = {A;, A}: 7=1,...,n}. The com- 
mutant A’ of A is the set 


A’ = {Be B(H): BA; = A;B, BA} = AFB, i=1,...,n}. 
We denote the set of fixed states of ¢4 by Z(¢,4). That is, 
L(ga) ={p € D(H): Gale) = p}- 


As an example, it is easy to find Z(¢p,). In this case p € Z(@pf) if and only 
if p= pp+(1—p)ZpZ. This is equivalent to p = ZpZ. Because Z? = I we 
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have that Zp = pZ. We conclude that p € Z(¢pf) if and only if p € A’ where 
A = {I,Z}. A similar result holds for ¢pf and dypr. In general we have the 
following result which is a special case of a theorem in [1, 5]. 


Theorem 9.5.1. [f ¢,4 is a self-adjoint, subtracial quantum operation, then 
T(b.4) CA’ D(H). 


Proof. Let p € T(¢,) and let h be a unit eigenvector of p corresponding to 
the largest eigenvalue \; = ||p||. Then ¢.4(~) = p implies that 


dr = D> (pAih, Ash) < [lol] S~ | Ashl|? = Ar So (APA, A) <n. 
Because (pAjh, Ajh) < di (Ah, h), it follows that 
(Ail — p)Ajh, Ajh) = 0. 


Hence, (A, /—p)A;h = 0 for every eigenvector h corresponding to ,. Thus, A; 
leaves the A1-eigenspace invariant. Letting P, be the corresponding spectral 
projection of p we have P,A;P, = A;P, which implies that A;P, = P,Aj;, 
i=1,...,n. Now p= A, P, + pi where p, is a positive operator with largest 
eigenvalue. Because 


MPL + pi = p = Galp) = Ar1Ga(Pi) + Ga(pr) = ALPi + Gar) 
we have ¢4(p1) = p1. Proceeding by induction, p € A’. 


Corollary 9.5.2. If d4 is a self-adjoint, tracial quantum operation, then 
T(ga) = AN D(H). 


As an application of Corollary 9.5.2 we see that Z(¢ap) = {1/2}. Indeed, 
if p = Z(¢,4) then p must commute with X,Y, and Z. But any 2 x 2 matrix 
is a linear combination of I, X, Y, and Z. It follows that p = I/2. The next 
example which is a special case of an example in [3] shows that self-adjointness 
cannot be deleted from Theorem 9.5.1 or Corollary 9.5.2. 

Let d4(B) = poee A;BA}; be the quantum operation with 


100 000 
Aj = 000 ; Ag = 010 ’ 
000 000 
1 000 1 000 
tee 2 000). aye 
v2 100 v2 010 


It is easy to check that ¢, is unital. However, @, is not self-adjoint and 
because 
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3 [100 
DAs 010 
000 


we see that #4 is not subtracial. Let p € D(C°) be the state 


_ [200 
p=-=|000 
001 


It is easy to check that p € Z(¢,4) but pA3 # Azp so that p ¢ A’. If we 
multiply the A;, 7 = 1,2,3,4, by \/2/3 then ¢,4 would be subtracial but 
again Z(¢4) Z A'N D(H). 


9.6 Idempotents 


We showed in Lemma 9.1.2 that if 64 is subtracial and its operation elements 
are self-adjoint projection operators, then ¢, is idempotent. We conjecture 
that a weak converse of this result holds. If Z is a Liiders map that is idem- 
potent, we conjecture that L can be written in a form so that its operation 
elements are self-adjoint projections. As a start, our next result shows that 
this conjecture holds in C? if L has two operation elements. 


Theorem 9.6.1. Suppose L(B) = Al? Bai? + AB? BAM? Aj, Ag > 0, 
A, + Ap = I, is a Liiders map on C? and L? = L. Then A, and Ag are 
self-adjoint projection operators or L is the identity map. 


Proof. Because A; + Ag = I, A; and Ag commute and because L? = L we 
have 


Ai? BA}? +. Aj BA,” = A,BA + A,BA;+2A}/ A; BA, Aj!” (9.10) 


for every B € B(C?). Without loss of generality, we can assume that A, is 
diagonal so that 


Letting 


in equation (9.10) and equating entries we obtain 


Vab +./(1—a)(1—b) = (1—a)(1 — b) + ab + 2\/ab(1 — a)(1 — 6). (9.11) 


Equation (9.11) can be written as 
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(1 Vab Va a(T=8) ) ( vVab + (T= a)(T—B) ) =0. 


We conclude that Vab + \/(1—a)(1— 6) = 0 or 1. In the first case a = 0, 
b=1 ora=1, b=0 and we are finished. In the second case, we can square 
the expression to obtain 


ab(l — a)(1—b) =a+b—2ab. (9.12) 
Squaring (9.12) gives 
(a —b)? =a? +57 — 2ab =0 


so that a = b. Hence, A; = al, Ag = (1 —a)I, and L(B) = B for all 
Be B(C?). 


9.7 Sequential Measurements 


This section discusses a topic that is important in quantum measurement 
theory, namely sequential products of effects. In this section we allow H to be 
infinite-dimensional and again denote the set of effects on H by E(H). Recall 
that effects represent yes-no quantum measurements that may be unsharp 
(imprecise). We may think of effects as fuzzy quantum events. Sharp quantum 
events are represented by self-adjoint projection operators. Denoting this set 
by P(H) we have that P(H) C E(H). 

We mentioned in Section 9.1 that for a quantum system initially in the 
state p € D(H), the postmeasurement state given that A € E(H) occurs is 
A'/?9A'/?/tr(pA). Assuming that tr(pA) #4 0, it is reasonable to define the 
conditional probability of B € E(H) given A € E(H) to be 


tr(A/2pA'/?B) _ tr(pA/? BA?) 


OVS apy AY 


(9.13) 


Now two measurements A,B € €(H) cannot be performed simultaneously 
in general, so they are frequently executed sequentially. We denote by Ao B 
a sequential measurement in which A is performed first and B second. It is 
natural to assume the probabilistic equation 


P,(Ao B) = P,(A)P,(B | A). (9.14) 
Combining (9.13) and (9.14) gives 


tr(pAo B) = tr(pA/?BA'?). (9.15) 


9 Quantum Computation and Quantum Operations 343 


Equation (9.15) motivates our definition Ao B = A’/?BA‘/? and we call 
Ao B the sequential product of A and B. If {Aj,...,A,} is a finite POV 
measure, then the Liiders map with operation elements A; can now be written 
as L(B) = >> A;o B. Notice that Ao B € E(H) so © gives a binary operation 
on €(H). Indeed, 


0< (AV? BAV?2, x ) a (BAY, AV?a) < (A/?, AV? 2) 
= (Az, 2) < (x, 2) 


(9.16) 


so that 0 < A!/?BA!/? < I. It also follows from (9.16) that Ao B< A. 
We say that A, B € E€(H) are compatible if AB = BA. It is clear that the 
sequential product satisfies 


0oA=Ao0=0 
IoA=Aol=A 
Ao(B+C)=AoB+AoC whenever B+C <I 
(AA)oB=Ao(AB)=X(AoB) for0<A<1. 
However, Ao B has practically no other algebraic properties unless compati- 
bility conditions are imposed. To illustrate the fact that Ao B does not have 
properties that one might expect, we now show that Ao B = AoC does not 


imply that Bo A = Co A even for A,B,C € P(H). In H = C? consider 
A, B,C € P(H) given by the following matrices, 


1fil 10 00 
4a=5 |i) B= [54]: c= [54]: 


We then have 


AoB=ABA 5A ACA =AcC. 


However, 


BoA=BAB=>B 4 5C=CAC=CoA. 


This example also shows that Ao B £ B in general, even though we always 
have AoB< A. 

We say that A, B are sequentially independent if Ao B = Bo A. It is clear 
that if A and B are compatible, then they are sequentially independent. To 
prove the converse, we need the following result due to Fuglede—Putnam— 
Rosenblum [11]. 


Theorem 9.7.1. [f M,N,T € B(H) with M and N normal, then MT = TN 
implies that M*T = TN*. 


Corollary 9.7.2. [8] For A,B € E(H), Ao B= BoA implies AB = BA. 
Proof. Because Ao B = Bo A we have 
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Al/2 Bi/2 Bl/2 4l/2 — Bl/2 41/2 41/2 Bl/2. 


Hence, M = A!/?B1/? and N = B/?.A\/? are normal. Letting T = A'/? we 
have MT = TN. Applying Theorem 9.7.1, we conclude that B!/2A = AB'/?, 
Hence, 


BA = B'/2 4B'/2 = AB. 


Sequential independence for three or more effects was considered in [8] 
and a more general result was proved. Our next result shows that if Ao B is 
sharp, then A and B are compatible (and hence, sequentially independent). 


Theorem 9.7.3. [8] For A,B € E(H), if AoBeE P(H), then AB = BA. 


Proof. Assume that Ao B € P(H). Suppose that Ao Bx = x where ||| = 
1. We then have (BA!/2z, Al/?2x) = 1. By Schwarz’s inequality we have 
BA'/?2z = A‘/2x and hence, Ar = Ao Bx = x. Because z is an eigenvector 
of A with eigenvalue 1, the same holds for A'/?. Thus, A‘/2a = x so that 
BA‘'/?z = Ao Bx. We conclude that BA‘/2x = Ao Br for all x in the range 
of Ao B. Now suppose that Ao Ba = 0. We then have 


|| BY/2,AY/2— |? = (BUA Aves BVA Aven) = (Ao Bu, x) =0 


so that B!/2,4!/2x7 = 0. Hence, BA!/?x = 0 and it follows that BA!/2x = 
Ao Bx for all z in the null space of Ao B. We conclude that BA!/? = Ao B. 
Hence, 

BA? = Ao B=(AoB)* = Al/?B 


so that AB = BA. 


The last theorem shows why it is important to consider unsharp effects. 
Even if A and B are sharp, then Ao B ¢ P(H) unless A and B are com- 
patible. Simple examples show that the converse of Theorem 9.7.3 does not 
hold. However, the converse does hold for sharp effects. 


Corollary 9.7.4. [f A,B € P(H) then Ao B € P(H) if and only if AB = 
BA. 


It follows from Corollary 9.7.4 that for A,B €¢ P(H) we have Ao B = B if 
and only if AB = BA = B. We now generalize this result to arbitrary effects. 


Theorem 9.7.5. [8] For A,B € E(H) the following statements are equiva- 
lent. (a) AoB=B.(b) BoA=B. (c) ABB=BA=B. 


Proof. It is clear that (c) implies both (a) and (b). It then suffices to show 
that (a) and (b) each imply (c). If Ao B = B we have 


B?A = ABABA a Al? B( Al? BAM?) Al/? _ Al? B2 Al/2. 
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Taking adjoints gives B?A = AB?. It follows that AB = BA= B. If BoA= 
B then for every x € H we have 


(ABY?2, BY 2a) = (Bo Aa, 2) = (Ba, 2) = ||BV 7a). 


4 Biz Bg = 
||B1/22|| ° || B1/2a| 
It follows from Schwarz’s inequality that AB!/?2 = B!/?x. Hence, AB‘/? = 
B‘/? so AB\/? = BY? A = B/?, We again conclude that AB = BA = B. 


If B\/?x 40 then 


Theorem 9.7.5 cannot be strengthened to the case Ao B < B. That is 
AoB<B does not imply AB = BA. For example, in C? let 


I i 1 |30 
4az|ii} =a loi} 
then Ao B< B but ABF BA. 
The simplest version of the law of total probability would say that 


P,(B) = P,(A)P,(B| A) + P,(I- A)P,(B|I-A), (9.17) 


where we interpret J — A as the complement (or negation) of A € E(H). In 
terms of the sequential product (9.17) can be written as 


P,(B) = P,(Ao B)+P, ((I— A)o B) = P, [(AoB+(I— A)oB)]. (9.18) 


When does (9.18) hold for every p € D(H)? Equivalently, when does the 
following equation hold? 


B=AoB+(I—A)oB. (9.19) 


This question is also equivalent to finding the fixed points of the Liiders map 
L(B)=AoB+(I—A)oB for Be E(H). 


Theorem 9.7.6. [5, 8] For A,B € E(H), (9.19) holds if and only if AB = 
BA. 


Proof. It is clear that (9.19) holds if AB = BA. Conversely, assume that 
(9.19) holds and write it as 


BH=AVRAP 4{T =A" B= Ay. 


Multiplying by A!/? on the left and right, we obtain 
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Al? BAM? = ABA+ (I— A)? A? BAM2(T — A)? 
= ABA+(I— A)¥? [B — (1 — A)? B(r — A)¥2] (r— Ay? 
=ABA—(f -A)BU-A) + -— Ay BT = aly" 
— ABA-(I- A)B(I- A)+ B- Al?BAl/?. 
Hence, 
2A? BA? — ABA-—(I—- A)B(I—- A) + B=AB+BA. (9.20) 


Using the commutator notation [X,Y] = XY — YX, (9.20) gives 
jar? [A’?, Bl] _ A’2(AU2B - BA?) _ (AV2B = BA?) Ai/? 
= AB-2A'?BAl/? + BA=0. 
It follows that for every spectral projection E of A we have 
[z. [A?, Bl] =i 
By the Jacobi identity 
[#, (442, Bl] + (Ble, AY?]] + [4¥?,[B, 8] =0. 

We have that [A’/?, [E, B]] = 0. As before we obtain [E, [E, B]] = 0. Hence, 

0= E(EB-— BE) -(EB- BE)E=EB+BE-2BE 


which we can write as 
EB =2EBE-— BE. 


Multiplying on the left by E gives EB = EBE. Hence, 
EB=(EBE)* =BE. 
It follows that AB = BA. 


Although the sequential product is always distributive on the right, The- 
orem 9.7.6 shows that it is not always distributive on the left. That is, 
(A+ B)oC #4 AoC+ BoC in general, when A+ B < I. Indeed, if 
AC # CA, then by Theorem 9.7.6 we have 


AoC+(I—-A)oC4C=[At(I- Alo. 


One might conjecture that the following generalization of Theorem 9.7.6 
holds. If A+ B< JI and (A+ B)oC =AoC+ BoC, then CA = AC or 
CB = BC. However, this conjecture is false. Indeed, suppose that CB 4 BC. 


9 Quantum Computation and Quantum Operations 347 
Nevertheless, we have 
($B + 5B)0C=BoC= sBo0C+5$BoC= (ZB) oC+ ($B) oC. 


We close by considering another generalization of Theorem 9.7.6. Suppose 
A; € E(H), i =1,...,n with )> A; = J and that B = ¥> A; o B. Does this 
imply that BA; = A;B, i =1,...,n? Notice that the answer is affirmative 
if A; € P(H), i = 1,...,n. In fact, we only need A; € P(H), i = 1,...,n 
and 5) A; < J. In this case, we have A;A; = Aj;A; = 0 for i # j. Hence, if 
B=)S>A;0B, then A;B = BA; = A;oB,i=1,...,n. A proof very similar to 
that in Theorem 9.5.1 gives an affirmative answer when dim H < oo or when 
B has discrete spectrum with a strictly decreasing sequence of eigenvalues. 
However, when dim H = oo the answer is negative in general [1]. 
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Chapter 10 
Ekeland Duality as a Paradigm 


Jean-Paul Penot 


Summary. The Ekeland duality scheme is a simple device. We examine its 
relationships with several classical dualities, such as the Fenchel—Rockafellar 
duality, the Toland duality, the Wolfe duality, and the quadratic duality. In 
particular, we show that the Clarke duality is a special case of the Ekeland 
duality scheme. 


Key words: Clarke duality, duality, Ekeland duality, Fenchel transform, 
Legendre function, Legendre transform, nonsmooth analysis 


10.1 Introduction 


Duality is a general tool in mathematics. It consists in transforming a difficult 
problem into a related one which is more tractable; then, when returning to 
the initial, or “primal”, problem, some precious information becomes avail- 
able. Although such a process is of common use in optimization theory and 
algorithms (see [23, 41, 45] and their references), it pertains to a much larger 
field. Cramer, Fourier, Laplace, and Radon transforms give testimonies of the 
power of such a scheme. 

Even in optimization theory, there is a large spectrum of duality proces- 
ses: linear programming, convex programming, fractional programming [21], 
geometric programming, generalized convex programming, quadratic pro- 
gramming [13], semidefinite programming, and so on. It is the purpose of 
the present chapter to show that several classical duality theories can be cast 
into a simple general framework. 
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A number of physical phenomena can be described by using the minimizers 
of a suitable potential function; however, it may be sensible to consider that 
a notion of stationarity is more adapted than minimization or maximization. 

In a famous paper [14] I. Ekeland introduced a duality scheme that deals 
with critical points instead of minimizers and takes advantage of the power of 
the tools of differential topology. In order to extend the reach of his theory we 
drop the smoothness properties required in [14], following a track indicated 
in [15]. For such an aim, we make use of elementary notions of nonsmooth 
analysis recalled in Section 10.4 below. 

We particularly focus our attention on the convex case for which a close 
link between the classical Fenchel duality and the Ekeland duality can be 
obtained thanks to a slight extension of the Bronsted—Rockafellar theorem. 
But we also consider the concave case, the quadratic case, the Toland duality, 
and the Clarke duality. The Clarke duality deals with the study of the set of 
critical points of a function f of the form 


(Ax, x) + g(x) ce xX, 


Nile 


f(@) = 


where X is a Banach space, A is an self-adjoint operator from X into X* 
(i.e., (Ax, a’) = (a, Ax’) for any 2,2’ € X) and g: X > Ry := RU {+00} is 
a closed proper convex function. It has been applied to the study of solutions 
to the Hamilton equation in [5, 7-10, 16-20]. 

It is the main purpose of the present chapter to endeavor to cast the 
Clarke duality in the general framework of the Ekeland duality. Such an 
aim may enhance the interest for this general approach. We also obtain a 
slight complement to the Clarke duality. On the other hand, we assume that 
the operator A is continuous (instead of densely defined). This assumption 
guarantees that the notion of critical point we adopt corresponds to a general 
and natural concept and is not just an ad hoc specific notion. This new feature 
is valid for all usual subdifferentials of nonsmooth analysis. This assumption 
suffices for the application to Hamiltonian systems. 

In Sections 10.2 and 10.3 we recall the Ekeland duality in the frame- 
work of normed vector spaces (n.v.s.). In Section 10.4 we present tools from 
nonsmooth analysis which enable one to give a rigorous treatment without 
assuming regularity assumptions. In particular, we introduce a concept of ex- 
tended Legendre function using methods reminiscent of the notion of limiting 
subdifferentials (Section 10.5). Such a concept encompasses the case of the 
Fenchel conjugate of a convex function. Therefore we can apply it to convex 
duality and show in Section 10.6 that the Fenchel—Rockafellar duality is part 
of the duality scheme we study. The same is shown for the Toland duality 
in Section 10.7 and for the Wolfe duality in Section 10.8. The last section 
is devoted to showing that Clarke duality is a special instance of Ekeland 
duality. 

We do not look for completeness but we endeavor to put some light on some 
significant instances. Duality of integral functionals is considered elsewhere. 
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Duality in the calculus of variations using the Ekeland’s scheme is performed 
in [14] and [15]. 

Because, as mentioned above, many phenomena in physics and mechanics 
can be modeled by using critical point theory rather than minimization, we 
believe that the extensive approach by D. Gao and his co-authors (see [22-29] 
and their references) deserves some more attention and should be combined 
with the present contribution. 

In the sequel P stands for the set of positive real numbers, B(0,71) is the 
open ball with center 0 and radius r, and Sx := {u € X : ||u|| = 1} is the 
unit sphere in a normed vector space. 


10.2 Preliminaries: The Ekeland—Legendre Transform 


The Ekeland duality deals with the search of critical points and critical values 
of functions or multifunctions. It can be cast in a general framework in which 
there is no linear structure (see [44]), but here we remain in the framework 
of normed vector spaces (n.v.s.) in duality. 


Definition 10.1. Given two n.v.s. X, X’ and a subset J of X x X'x R,a 
pair (a,7) is called a critical pair of J if (x,0x-,r) € J. A point x of X is 
called a critical point of J if there exists some r € R such that (z,r) is a 
critical pair of J. A real number r is called a critical value of J if there exists 
some x € X such that (x,r) is a critical pair of J. 


The extremization of J consists in the determination of the set ext J of 
critical pairs of J. When J is a generalized 1-jet in the sense that the pro- 
jection G of J on X x R is the graph of a function 7: Xo — R defined on 
some subset Xo of X, the extremization of 7 is reduced to the search of crit- 
ical points of J. Note that J is a generalized 1-jet if and only if one has the 
implication 


(24,24, 71) € di (x2, 25,72) € J. %=%q = T1=72. 


Example 10.1. In the classical case X’ is the topological dual space X* of 
X and J is the 1-jet J‘j of a differentiable function 7: Xo — R, where Xo is 
an open subset of X, defined by 


J'j = {(x, Dj(x), j(a)) : « € Xo}, 


where Dj(a) is the derivative of 7 at x. Then we recover the usual notion. 
One may also suppose as in [14] that Xo is a differentiable submanifold in X 
and replace Dj(x) by dj,, the restriction to the tangent space to Xo at x of 
the 1-form dj. 

The fact that J may be different from a 1-jet gives a great versatility to 
the duality which is exposed. 
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Example 10.2. Given a convex function 7: X > Ra := RU {+o}, let X’ 
be the topological dual space X* of X and let J be the subjet of 7, defined 
by 

J:={(@, 2", j(a)): a € dom j, x” € Oj(x)}, 
where dom j := j~1(IR) and 0j(x) C X* is the Fenchel—Moreau subdifferen- 
tial of 7 at x given by 


a” € Oj(x) & j(-) = w*(-) + j(a) — @*(a). 
Then the extremization of J coincides with the minimization of 7. 


In view of its importance for the sequel, let us anticipate Section 10.4 by 
presenting the next example. 


Example 10.3. Let J be the subjet J°j of a function 7: X > Ro := RU 
{oo} associated with some subdifferential 0: 


J°5 := {(a,a',r) © X x X'xXR: 2! € Ai(2), r = j(a)}. 


In such a case, ext J is the set of pairs (x,r) such that 0x € Oj(a), r = 9 (a). 
We make clear what we mean by “subdifferential” in Section 10.4. For the 
moment we may take for Qj either the proximal subdifferential 0”7 of j, 
given by 2* € OP j(ax) iff 


Jer € P: Vu € B(0,r) j(x +u) > v*(u) + f(x) — ellull’, 


or the Fréchet (or firm) subdifferential 0’7 of j given by x* € OF f(x) iff 


Jae A:Wue X j(a+u) 2 2*(u) + j(x) — a(|lull) lla, 


where A is the set of functions a: Ry — RyU{+oco} satisfying lim,_.9 a(r) = 
0, or the Dini-Hadamard (or directional) subdifferential 0?j of 7 given by 
x* € 0” f(a) iff 


Vue Sx, dae A: V(v,t) Ee XxRy j(a+tv) > x*(tv)+j(x)—a(||u — v||+t)t, 


or the Clarke-Rockafellar subdifferential given by x* € 0° j(x) iff 


Fae A:V(2',v,t)€ X? xR, j(x' + tv) > x*(tv) + j(2’) — a(s)t, 


with s := ||u — v||+||2’ — a||+t (in the case where f is continuous). Of course, 
in the preceding definitions we assume 7 is finite at 7 and we take the empty 
set otherwise. 


We can generalize the preceding cases by considering other subdifferentials 
appropriate for nonconvex functions (here we have chosen the most usual 
subdifferentials among classical ones). 
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Example 10.4. Let 7: X — R be a concave function and let J be the subjet 
J°j of j for one of the first three preceding subdifferentials. Then the ex- 
tremization of J leads to the maximization of j. In fact, if * € Oj(x), then 
for all uw € X one has 
1 

'(c,u):= lim —(j(x@+tv) —j(x)) > a*(u), 

(eu) =, tim 5 ile + tv) ~ j(2)) 2 2(u) 
so that 7 is Hadamard differentiable at x, with derivative x*. Thus —a2* € 
O(—Jj)(x) and if 2* = 0 we get that x is a maximizer of j. If 2* € 0° j(x) 
and j is continuous, we also have —2* € 0°(—j)(a) = 0(—Jj)(x) because 7 is 
locally Lipschitzian. 


Example 10.5. Given a subdifferential 0 and a function 7: X — R,g, let 
J := {(a,0',r) € XxX'xR: a! € Yj(x) = O7(x)U(—O(—J)(x)) , r = j(x)}. 


This choice is justified by the case where j is concave. In such a case, a pair 
(a,7r) is critical if and only if x is a maximizer of 7 and r = maxj(X): the 
condition is sufficient because for any maximizer x of j one has 0 € 0(—7)(x); 
we have seen that it is necessary when 0 € 0j(x) and it is obviously necessary 
when 0 € —O(—J)(x) because —7 is convex. 


Example 10.6. Let 7 be a d.c. function, that is, a function of the form 
j :=9-—h, where g and h are convex functions on some convex subset of X. 
Let 


J :={(a,0',r)€ X x X'x R: a! € g(x) HOK(2z), r= j(x)}, 


where, for two subsets C, D of X’, CH D denotes the set of x’ € X’ such 
that D+ a’ C C. Some sufficient conditions ensuring that Og(a) H Oh(z) 
coincides with the Fréchet subdifferential of 7 are known [1]; but in general 
J is different from J’). 


Example 10.7. Let (S, S,o) be a measured space, let be a Banach space, 
and let @: S x E — R be a measurable integrand, with which is associated 
the integral functional 7 given by 


j(a) = [ s2(s))do(s) re xX, 


where X is some normed vector space of (classes of) measurable functions 
from S to E; for instance X := L,(S,E) for some p € [1,+o00[. Then, if X’ 
is a space of measurable functions from S to the dual E* of F (for instance 
X' := L,(S, E*), with q := (1— _p7!)7?) one can take 


J:={(a,0',r)e Xx X'KR: a'(s) € Ol,(x(s)) ae. s € Sr = j(x)}, 


where @, := ¢(s,-). One can give conditions ensuring that J is exactly the 
subjet of 7; but in general that is not the case. 
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Let us present another example of a different kind bearing on mathematical 
programming. 


Example 10.8. Let X and Z be n.v.s. with dual spaces X* and Z*, respec- 
tively. Given a closed convex cone Cin Z and differentiable maps f: X — R, 
g: X > Z, let 


J = {(a, f(x) + 2* 0 g'(x), f(x): 2* € O°, (z*, g(x) = 0}, 


where C® := {z* € Z* : (z*,z) < 0 Vz € C} is the polar cone of C. This 
choice is clearly dictated by the Karush—Kuhn—Tucker optimality conditions. 
But, as is well known, a solution of the mathematical programming problem 


(M) minimize f(x) subject to g(x) € C 
is a critical point for J only when some qualification condition is satisfied. 


The approach of Ekeland to duality [14, 15] can be extended to the case of 
an arbitrary coupling (see [44]). Here we limit our study to bilinear couplings. 
The normed vector space X appearing in the following definition is usually 
a space of parameters and X’ is usually its topological dual space, but other 
cases may be considered. 


Definition 10.2. Given two normed vector spaces X, X’ paired by a bilinear 
coupling function c: X x X' — R, the Ekeland (or Legendre) map is the 
mapping E: X x X’ x R— X’ x X x R given by 


E(a,a',r) := (a',”,c(x,2') —r). 


Clearly, F is a kind of involution: denoting by E’ the mapping E’: X’ x 
XxR-—- X x X’ xR given by E'(2’,2,r) := (x, 2',c(xz,2’) — r), one has 
Eo! =I, E'oE =I, so that E~' = E’ and E’ has a similar form. In 
particular, when X’ = X, one has E’ = E, and E is a true involution. We 
show that under appropriate assumptions, the transform EL induces a kind 
of conjugacy between functions on X and on X’. It can also be applied to 
multifunctions. 


Definition 10.3. Given paired n.v.s. X and X’, the Ekeland transform J® 
of a subset J of X x X’ x R is the image of J by E: J® := E(J). 


10.3 The Ekeland Duality Scheme 


In the present chapter the decision space X and the parameter space W play 
a symmetric role; it is not the case in [44] where X is supposed to be an 
arbitrary set. We assume X and W are n.v.s. paired with n.v.s. W’ and _X’, 
respectively, by couplings denoted by cw, cx, or just (-,-) if there is no risk 
of confusion. Then we put Z := W x X in duality with X’ x W’ by the means 
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of the coupling c given by 
c((w, x), (a, w’)) = cw(w, w’) + cx (a, 2’). (10.1) 


Such an unorthodox coupling is convenient in the sequel. 

The following definition is reminiscent of the notion of perturbation which 
is one of the two main approaches to duality in convex analysis. However, 
it is taken in a more restrictive sense when J is the subjet of some convex 
function, unless the convex function is continuous. 


Definition 10.4. Given two pairs (W,W’), (X,X’) of n.v.s. in duality, and 
a subset J C X x X’ x R, a subset P of W x X x X’ x W’ x R is said to be 
a hyperperturbation of J if 


J={(a2,2',r)e Xx X'’xR: Jw’ €W’, (Ow,2,2’,w',r) © P}. 


A subset P of W x X x X’ x W’ xR is said to be a critical perturbation of J if 


(x, 0x7,1) eJeidu'e W’~ (Ow, 2, 0x, w’,7r) EP. 


In other terms, P is a hyperperturbation of J if J coincides with the 
domain of the slice Py: X x X’ x R= W’ of P given by 


Po(a,2',r) := {w' © W': (Ow, 2, 2',w',r) € P}. 
In order to study the extremization problem 
(P) find (z,r) € X x R such that (7,0x/,r) € J, 


given a critical perturbation P of J and a coupling c: W x W’ — R, following 
Ekeland [14, 15] one can introduce the transform P’ := E(P) C X’ x W’ x 
W x X x R of P given by 


P’ := {(2',w’,w,2, (w’,w) + (2’,2) — 1): (w,2,2’,w',r) © P}. 
The domain 


J’ ={(w',w,r')eW' xW xR: sre X, (0x,w'’,w,2,7r’) € P’} 


of the slice Pi: W'x W x R= X of P’ given by 
Py(w’,w,r’) := {x € X : (Ox, w’,w, 2,17’) € P’} 
yields the extremization problem 
(P’) find (w’,r’) € W’ x R such that (w’,Ow,r’) € J’ 


called the adjoint problem. Denoting by ext J the solution set of (P) (ie., 
the set of (x,r) € X x R such that (x,0x/,7r) € J) and by ext J’ the solution 
set of (P’), one has the following result. 
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Theorem 10.1. Let J be a subset of X x X'xR. For any critical perturbation 
P of J, the set P’ := E(P) defined as above is a hyperperturbation of J’, hence 
is a critical perturbation of J’. Moreover, the problems (P) and (P’) are in 
duality in the following sense. 


(a) If (w',r’) € ext J’, then P}(w',Ow,r’) is nonempty and for any x € 
Pi(w', Ow,7’) one has (a,—r’) € ext J. 

(b) If (z,r) € ext J, then Po(x,0x',r) is nonempty and for any w' € 
Po(x,0x1,1r) one has (w’',—r) € ext J’. 

(c) The set of critical values of (P) is the opposite of the set of critical 
values of (P’). 
Proof. The first assertion is an immediate consequence of the definition of 
P’ and J’: a pair (w’,r’) € W’ x R is in ext J’ if and only if there exists 
some « € X such that (Ox, w’,0w,2,7r’) € P’; that is, ¢ € Pj(w’,0w,r’). 
For any such x one has (Ow, 2,0x’,w’,—r’) € P, hence (,0x-,—r’) € J or 
(x, —r’) € ext J. Assertion (b) similarly results from the implications 


(a,r) € ext J & (a, 0x,r) € J 
w EW’: (Ow, 2, 0x, w’,r) € P 


ed 
= dw’ € W’: (Ox:,w',0w,2,—r) € P’ 


so that for any w’ € Po(x,0x,r) one has « € Pj(w',Ow,—r); that is, 
(w’,—r) € ext J’. Assertion (c) is part of the preceding analysis. 


The problem 
(P*) find (w’,r) € W’ x R such that (w’, Ow, —r) € J’ 


can be called the dual problem of (P). 
The preceding result is akin to [15, Proposition 3] which deals with the 
enlarged problem 


(€') find (w’,a,r’) € W' x X x R such that (Ox, w’,Ow,2,7’) € P’. 
It clearly corresponds to the problem 
(E) find (z,w',r) € X x W’ x R such that (Ow,2z,0x,w’,r) € P 


via the relation r’ = —r. [15, Proposition 3] is subsumed by the following 
statement. Each of its assertions implies that (x,1r) is a solution to (P) and 
(w’,7r’) is a solution to (P’) for r = —r’. 

Proposition 10.1. For an element (w’,x,r’) of W’ x X x R the following 
assertions are equivalent. 


(a) (w’,x,r’) is a solution to (E'). 
(b) (a,r) with r:=—r' is a solution to (P) and w' € Po(x,0x,—-1"). 
(c) (w',r’) is a solution to (P') and « € P}(w',0w,7’). 


10 Ekeland Duality as a Paradigm 357 


Proof. Each assertion is equivalent to (Ow,2,0x,w’,—r) € P. 


We notice that applying to P’ the same process, we get an enlarged prob- 
lem (€”) which coincides with (€). Thus, as for (P) and (P’) we have an 
appealing symmetry. 


10.4 Tools from Nonsmooth Analysis 


A case of special interest arises when the perturbation set P is the subjet 
of some function p: W x X — R. Although its Ekeland transform is not 
necessarily a subjet, in some cases one can associate a function with it. In 
such a case, the dual problem becomes close to the classical dual problem, as 
we show in the following sections. In order to deal with such a nice situation 
we need to give precise definitions. 

Let us first make clear what we mean by “subdifferential.” Here, given a 
n.v.s. X with dual X’ = X*, aset F(X) C RX of functions on X with values 
in Ry, a subdifferential is a map 0: F(X) x X — P(X’) with values in the 
space of subsets of X’ which associates with a pair (f,x) € RX x X a subset 
Of (x) of X’ which is empty if x is not in the domain dom f := {a € X : 
f(x) € R} of f and such that 


(M) If a is a minimizer of f, then 0x € Of (a). 


Thus, minimizers are critical points. We do not look for a list of axioms, 
although such lists exist ([4, 30-32, 39] and others). However, we may require 
some other conditions such as the following ones in which X, Y, Z are n.v.s. 
and L(X,Y) denotes the space of linear continuous maps from X into Y: 


(F) If f is convex, Of coincides with the Fenchel—Moreau subdifferential: 
Of (x) = {a" © X™: fl) 2 a*(-) — a" (a) + f(x)}- 


T) If f := g+h, where h is continuously differentiable at x, then Of (x) = 
Og(a) + D(a). 

To) If f is continuously differentiable at x, then Of(x) = {Df(a)}. 

C) If f := gof, where € € L(X,Y) and g € RX, then Og(€(x))o£ C Of (zx). 
C’) If f := goe, with  € L(X,Y) open, g € R*,, then Og(E(x))o£ C Of (x). 
O) If f := go &, where € € L(X,Y) is open and g: Y — R is locally 
Lipschitzian, then Of(x) C Og(C(x)) 0 2. 

P) If f :=gopy, where py: Y x Z — Y is the canonical projection and 
g € RX, then Of(y, z) = Og(y) ° py. 

D) If f :=go 4, where ¢ € L(X,Y) is an isomorphism and g € RW, then 
Of (x) = Og(L(a)) 0 &. 


Clearly (To) is a special case of the translation property (T) and (P) is 
a special case of the conjunction of the composition properties (C) and (O). 
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Condition (D) which can be considered as a very special case of (P) is satisfied 
by all usual subdifferentials. Other relationships are described in the following 
statement. 


Proposition 10.2. (a) If 0 is either the Fréchet subdifferential or the Ha- 
damard subdifferential then conditions (F), (T), (C), and (O) are satisfied. 

(b) If O is either the Clarke subdifferential [6] or the moderate subdiffer- 
ential [33] then conditions (F), (T), (C’), and (O) are satisfied. 


Proof. (a) The coincidence with the Fenchel subdifferential (F), the transla- 
tion property (T), the composition properties (C) and (O) are easy to check. 
Let us prove the two latest ones. Given z € X,£€ L(X,Y), and y* € 0’ g(y), 
with y := ¢(x), we observe that for every u € X we have 


f(a, u) 2 gy, €(u)) = (y", eu). 


Thus y* of € 0” f(x). If y* € 0" g(y), one can find some function 8: Y > R 
such that lim,_.9 6(v) = 0 and 


gly + v) — gy) — (y*,v) = —B(v) [lull 


for v in a neighborhood V of 0 in Y. Then, for u € U := €-1(V) one has 
f(et+u)— f(x) — (y* of,u) = —B(é(u)) [All lull, 


so that y* of € OF f(z). 

Now suppose f is open. Because By C ¢(cBx), for some c > 0, where Bx, 
By are the closed unit balls of X and Y, respectively, for every unit vector 
v € Y we can pick some u € cBx such that ((u) = v. By homogeneity, we 
obtain a map h: Y > X such that €(h(v)) = v and ||A(v)|| < e|lvl] for all 
v €Y. Let «* € 0? f(x). Because g is locally Lipschitzian, for all u€ X we 
have (x*,u) < f’(a,u) = g'(y, €(u)) and g/(y,0) = 0. Thus, (x2*,u) = 0 for all 
u in the kernel N of @. Because @ is open, it follows that there exists some 
y* € Y* such that z* = y* o @. From the surjectivity of @ and the relation 
(y* 0 £,u) < g/(y, &(u)) for all u € X we conclude that y* € 0? g(y). Now let 
us suppose x* € OF f(x). By what precedes we obtain that there exists some 
y* € O'g(y) such that 2* = y* of. Let a: X > R and r > 0 be such that 
lim,—o a(u) = 0 and 


f(@+u) — fle) — (a*,u) = —a(u) [lull 


for u € rBx. Let h: Y — X be the map constructed above, and let s := 
ctr. Because for all v € sBy we have h(v) € rBx, we get, as (y*,v) = 
(y*, &(h(v))) = (x*, R(v)), g(y) = f(x), and f(a+h(v)) = g(€(x) + €(A(v))) = 
gy +), 

gly + v) — gly) — (y*,v) = —B(r) [oll 


with B(v) := ca(h(v)) + 0 as v — 0. Thus y* € OF g(y). 
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(b) Again, the assertions concerning (F) and (T) are classical and elemen- 
tary. For the Clarke subdifferential, the assertions concerning (C’) and (O) 
are particular cases of [6, Theorem 2.3.10]. The case of the moderate subdif- 
ferential is similar. 


Let us insist on the fact that extremization problems are not limited to 
the examples mentioned in the previous sections. In particular, one may take 
for J some subset of the closure of a subjet with respect to some topology 
(or convergence) on X x X’ x R. Another case of interest appears when X 
is an.v.s. and J is the hypergraph of a multifunction M: X = R associated 
with a notion of normal cone: 


H(M) := {(a,2*,r) © X x X* xR: (a*,-1) € N(G(M), (2, 1r)),r € M(a)}, 


where G(M) is the graph of M and N(G(M), (x,1r)) denotes the normal cone 
to G(M) at (z,r). The normal cone N(S,s) at s to a subset S of a n.v.s. X 
can be defined in different ways. Some axiomatic approach can be adopted 
as in [40]. When one disposes of a subdifferential 0 on the set of Lipschitzian 
functions on X one may set N(S,s) := R,Qds(s), where dg is the distance 
function to S: dg(a) := inf{d(x,y) : y € S}. When the subdifferential 0 is 
defined over the set S(X) of lower semicontinuous functions on X, one can 
also define N(S,s) by N(S,s) := Ovg(s), where ug is the indicator function 
of S given by us(x) = 0 for x € S', +co else. 
Introducing the coderivative D* M(x,r) of M at (x,r) € G(M) by 


D* M(a,r) := {a* € X* : (a*,-1) © N(G(M), (2,r))}, 


we see that H(M) is the set of (a, a*,r) € X x X* x R such that x2* € 
D* M(a,r). In particular, if M is the epigraph multifunction of a function 
f, H(M) coincides with J? f whenever x* € Of (x) if and only if (2*,—1) € 
N(epi(f), (2, f(@))). 

When M is a hypergraph, E(M) is not necessarily a hypergraph. When 
M is the subjet J°f associated with a function f on X and a subdifferential 
O, the set E(M) is not necessarily the subjet of some function on X’. It is of 
interest to introduce a notion that implies part of such a requirement. This 
is the aim of the next section. 


10.5 Ekeland and Legendre Functions 


We first delineate a class of functions for which a conjugate function can be 
defined. 


Definition 10.5. [42] Given a pairing c between the n.v.s. X and X’ and a 
subdifferential 0: F(X) x X — P(X"), a function f € F(X) is an Ekeland 
function with respect to O, in short an 0-Ekeland function, or just an Ekeland 
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function if there is no risk of confusion, if for any 71,72 € X, 2’ € X’ 
satisfying x’ € Of (a1) N Of (x2) one has c(x, x’) — f(a1) = c(x2, 2’) — f (ae). 

Then, the Ekeland transform of f is the function f": X' + R. given by 
f¥(z') := e(z, 2’) — f(x) for x € (Of)-1(z’) for x’ € OF (X), fF (2’) = +00 
for a’ € X'\Of(X). 


Thus, the graph of f” is the projection on X’ x R of E(J°f). 


Example 10.9. Any convex function (on some n.v.s.) is an Ekeland function 
for any subdifferential satisfying condition (F). In fact, for any given 2’ € X’, 
every x € (Of)~1+(z2’) is a maximizer of the function c(-, x’) — f(-) so that the 
value of this function at x is independent of the choice of zx. 


Example 10.10. Any concave function on some n.v.s. X is an Ekeland 
function for the Fréchet and the Dini-Hadamard subdifferentials. In fact, 
for any 71,02 € X, a* ©€ X™* satisfying 2* € Of(a1) N Of(x2) one has 
(a*,@1) — f (a1) = (a*, x2) — f (a2) because in such a case x* is the Hadamard 
derivative of f at x; (i = 1,2), hence (a*,2;) — f(a;) = min{(2*,2) — f(x): 
x € X}. Then f® is the restriction to f’(X) of the concave conjugate f, of 
f. Similar assertions hold when f is continuous. 


Example 10.11. Let f be a linear-quadratic function on X; that is, f(x) := 
(Ax, x) —(b, z) +c for some continuous symmetric linear map A: X > X/:= 
X*, be X', cE R. Let O be a subdifferential satisfying condition (To), such 
as the Clarke, the Fréchet, the Hadamard or the moderate subdifferential. 
Then f is an Ekeland function. In fact, given 2’ € X’, 21,22 € X such that 
f' (zi) = 2’ one has 


1 1 
(a, 2;) a f (xi) = (Ax; — 5, Xi) = 5 Ari, Xi) + (b, i) -c= 3 Ari, ts) =€ 
and 
(Ax1, £1) = (Axe, £2) = (A(a1 = 22), £1) + (Axe, 21 = x2) =0 


because A is symmetric and Ax; = xv’ +b = Axo, so that A(x, — x2) = 0. 
Thus, for x’ € A(X) — }, we can write f®(x") = $(x' + b, A~'(a2’ + b)) —c, 
even if A is noninvertible. 


Example 10.12. Let f: X — R be a partially quadratic function in the 
sense that there exist a decomposition X = X, @ X2 as a topological direct 
sum, an isomorphism A: X; > X{, where X{ := Xd := {2' € X’ := X*: 
a’ | X. = 0}, b € Xj, ¢ € R such that f(z) := $(Az,x) — (b,2) + ¢ for 
x € X1, f(x) = +00 for « € X\X,. Let O be the Clarke, the Fréchet, the 
Dini-Hadamard, or the moderate subdifferential. Then, for x € X, one has 
Of (x) = Ax + X$, where X4 := Xj. Then, as in the preceding example, one 
sees that for any x’ € X’, x € (Of)! (z’) the value of (2’,.z) — f(x) does not 
depend on the choice of x in (Of)~* (a). Thus f is an Ekeland function. 
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The following definition stems from the wish to get a concept which is 
more symmetric than the notion of Ekeland function. It is also motivated by 
the convex (and the concave) case in which the domain of f” is the image 
of Of (respectively, f’) which is not necessarily convex, whereas a natural 
extension of f” is the Fenchel conjugate whose domain is convex and which 
enjoys nice properties (lower semicontinuity, local Lipschitz property on the 
interior of its domain, etc.). 


Definition 10.6. Let X and X’ be n.v.s. paired by a coupling function 
c: X x X' > R. A Ls.c. function f: X — Ro is said to be a (general- 
ized) Legendre function for a subdifferential 0 if there exists a l.s.c. function 
f¥: X' + Re such that 


) f and f” are Ekeland functions and f” | Of(X) = f® | Of(X). 
) For any x € dom f there is a sequence (2p, 2/,,T7n)n in J°f such that 
/ 


Do Cit, 2) 8a) > Oe). 
c) The relations x € X, 2’ € Of (x) are equivalent to x’ € X', x € Of *(2’). 


Condition (b) (resp., (b’)) ensures that f (resp., f%) is determined by its 
restriction to dom Of (resp., dom Of”). In fact, for any x € dom f one has 


F(x) ~ enn te F(u) 


because f(x) < liminf,., f(u) and (b) implies f(x) = lim, f(a) for some 
sequence (%,) — x in dom df. Moreover, conditions (a) and (b’) imply that 
f¥ is determined by f. 

Condition (b) can be simplified when Of is locally bounded on the domain 
of f. In that case, condition (b) is equivalent to the simpler condition 


(bo) For any x € dom f there exists a sequence (2), in dom Of such that 


(tn; f(@n)) > (@, f(#))- 


Example 10.13. Any classical Legendre function is a (generalized) Legendre 
function. We say that a function f: U — R on an open subset U of a n.v:s. 
X isa classical Legendre function if it is of class C? on U and if its derivative 
Df is a diffeomorphism from U onto an open subset U’ of X*. In fact, one 
can show that it suffices that f be of class C' and that its derivative Df be 
a locally Lipschitzian homeomorphism from U onto an open subset U’ of X* 
whose inverse is also locally Lipschitzian. See [42, 43] for such refinements. 

In particular, let f be the linear-quadratic function on X given by f(x) := 
(1/2) (Ax, x) —(b, x) +c for some symmetric isomorphism A: X — X’ := X*, 
b € X’,c € R. Then f is a classical Legendre function because Df: 2 > 
Ax — b is a diffeomorphism. 


Example 10.14. A variant is the notion of Legendre-Hadamard function. A 
function f: U — R on an open subset U of a normed vector space X is a 
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Legendre-Hadamard function if it is Hadamard differentiable, if its derivative 
Df:U — X' := X™* is a bijection onto an open subset U’ of X’ which is 
continuous when X is endowed with its strong topology and X’ is endowed 
with the topology of uniform convergence on compact subsets, its inverse h 
satisfying a similar continuity property and the Ekeland transform f” of f 
given by 

FP (a!) = (h(2'), 2") — f(h(a’)) a’ €U! 


being Hadamard differentiable with derivative h. Then f and f” are of class 
T! in the sense that they are Hadamard differentiable and the functions 
df: U x X — R and df®: U' x X’ — R given by df(u,z) := Df(u)(z), 
df®(u',x’) := Df®(u’)(2’) are continuous (see [37]). Then f is a generalized 
Legendre function for the Dini-Hadamard subdifferential. In fact, if x’ € 
Of(x) for some x € U, one has z’ = Df (zx), hence x = A(z’), f®(2') = 
(x, x’) — f(x) and Of*(a') = {h(z’)} = {x}, so that conditions (a) and (c) of 
the preceding definition are satisfied. Conditions (b) and (b’) are immediate 
and in fact, for any « € U and any sequence (2,,) — «x one has (xt, — 
x, f'(%n)) — 0 and a similar property for f” by the assumed continuity 
property. 


Let us give a criterion which has some analogy with the one we gave in 
the preceding example. Now, the differentiability assumption on f is weaker, 
but the local Lipschitz condition on the inverse h of Df is changed into the 
assumption that for any 2’ € U’ the map h is directionally compact at x’ in 
the following sense: for any v’ € X’ and any sequences (vj,) > v', (tn) > 04 
the sequence (t;!(h(2’ + tnv},) — h(x’)) is contained in a compact set. Such 
an assumption is satisfied when h is Hadamard differentiable at any x’ or 
when X is finite-dimensional and h is locally Lipschitzian. 


Proposition 10.3. Suppose f is of class T! and its derivative Df is a bi- 
jection from U onto an open subset U' whose inverse h is directionally com- 
pact at every point of U'. Suppose the mappings df: (u,v) Df(x)(v) and 
(a’, vu’) + h(ax')(v') are continuous from U x X into R and from U' x X" into 
R, respectively. Then f is a Legendre-Hadamard function. 


Proof. It suffices to prove that f” is Hadamard differentiable at any 2’ € 
U’, with derivative h(a’). Let v’ € X’ and let (v/,) > vu’, (tn) — 04. Let 
us set vy, := t, (h(a! + tnvi,) — h(x’)), x := A(x’). By our assumption of 
directional compactness, (v,,) is contained in a compact subset of X, so that 
Qn := ty (f(a@ + trun) — f(x) —trDf(x)(un)) has limit 0, 2’ = Df (x) and 


Peri) =F @) 
= (2' + thu, h(a’ + trv,,)) — f(A(a! + trv;,)) — (2', h(x’) + f(A(z’)) 
=(a' +tyvi,,c+tnun) — f(2+tnn) — (2’, x) + f(x) 

tn (vl, 2) + tn{a’, vn) + #2.(v),, Un) —tnD f(z) (Un) — than 

tn (Up, 2) + tnBn 
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with 6, := tn(v),,Un) — an — 0. This shows that f” is Hadamard differen- 
tiable at x’, with derivative x := h(a’). 


Example 10.15. Any l.s.c. proper convex function f is a (generalized) Le- 
gendre function. In fact, a slight strengthening [38, Proposition 1.1] of the 
Bronsted—Rockafellar theorem ensures that for any « € dom/f there ex- 
ists a sequence (%,27%,) in the graph of Of such that ((@, — 2,7%)) — 0 
and (f(an)) — f(a). The same is valid for the Fenchel conjugate function 


f® = f*. Moreover, as is well known, condition (c) holds in such a case. 


Example 10.16. Let f: X — RU {—co} be a concave function such that 
U := dom(—f) and U’ := dom((—f)*) are open and f and its concave 
conjugate f, are differentiable on U and U’, respectively; here f, is given 
by f.(v’) = infpex((x’,x) — f(x)) = —(—f)*(—2’) and differentiability is 
taken in the sense of Fréchet (resp., Hadamard) when one takes the Fréchet 
(resp., Hadamard) subdifferential. Then f is a generalized Legendre function 
for this subdifferential 0. In fact, x’ € Of(«) if and only if f is Fréchet (resp., 
Hadamard) differentiable at « and 2’ = f’(x). Then, for g := —f, one has 
—a’ = g'(x), hence x € 0g*(—a’). Because f, is supposed to be differentiable, 
g* is also differentiable and x = (g*)’ (—2’) = (f.)' (a’) € Of, (a’). Moreover, 
one has f"(z’) = (2’,x) — f(x) = fx(2’). Condition (b) is satisfied because 
for any x € U one can take (a, 2',,rn) = (a, f'(x), f(x)). Because the roles 
of f and f, are symmetric, we see that f is a generalized Legendre function. 


Remark. Let U be an open convex subset of an Asplund space X with 
the Radon—Nikodym property. Let f be a continuous concave function on 
U such that its concave conjugate f, is finite and continuous on an open 
convex subset U’ of X’ and —oo on X’\U’. Let 0 be either the Fréchet or 
the Hadamard subdifferential and let f” := f,. As in the preceding example 
we see that for any x’ € Of(X) one has f4(2') = f¥(2'). By definition of an 
Asplund space, f is Fréchet differentiable on a dense subset D of U. Because 
it is also locally Lipschitzian, its derivative is locally bounded on D. Thus, if 
x €U and if (x,) is a sequence of D with limit x, then ((f’(an),@p, — x)) > 0. 
Now, because f, is defined on an open convex subset and is continuous and 
upper semicontinuous for the weak* topology, it is also Fréchet differentiable 
on a dense subset of its domain by a result of Collier [11] and by a similar 
argument, we see that condition (b’) is satisfied. However, condition (c) is 
not necessarily satisfied. For example, let X be a Hilbert space, and let f be 


given by f(x) := —max(|l2l|, ||z||?). Then f,(2’) = —1—||a’|| for x’ € 2B’, 
where B’ is the closed unit ball of X’ and f,(2’) = —} I|2"||? for 2’ © _X’\2B". 
Let wu be a unit vector in X and let u’ € X’ be such that (u’,u) = —2, 


||u’|| = 2; then we have u € Of f,(u’) but u’ ¢ OF f(u) because f is not 
Fréchet differentiable at wu. 


Example 10.17. Let 0 be a subdifferential such that 0(—f)(a) = —Of(z) 
when f is locally Lipschitzian around x. For instance 0 may be the Clarke 
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subdifferential [6], the moderate subdifferential [33], or be given as Y f(x) := 
OF f(x) U (-O¥ (—f)(x)) or O? f(x) U (-O?(—f)(x)). Let f be a concave 
function such that —f and —f, have open domains and are continuous on 
their domains. Then f is a generalized Legendre function. In fact, using the 
notation g := —f and arguments as in the preceding example, we see that 
if a’ € Of(ax) we also have —2’ € Og(x), hence « € Og*(—2') = O(-fx © 
(—Ix))(-a") = Of. (2"). 


10.6 The Fenchel—Rockafellar Duality 


A particular case requires some developments. It concerns the case when W 
and X are n.v.s. with dual spaces W’ and_X’, respectively, and when a subset 
K of W x X x X'x W’ x R and a densely defined linear mapping A: X — W 
with closed graph and transpose AT are given such that 


J:={(a,v',r): du’ € X',w' © W’, (Ow, 2,u',w',r) eK, ve =u’ 4+ Atwu’}. 

(10.2) 
Again, we consider W x X and X’ x W’ are paired with the coupling c of 
(10.1) which defines an isomorphism y: (W x X)* — X’ x W’. Thus the 
primal problem is 


(P) find (@,r) € X x R such that Jw’ € W’, (Ow,2,-ATtw’,w',r) eK. 


The special case when K is the image by ¥ := Iwxx X ¥ X Ip of the subjet 
of a function k: W x X — R, A is continuous and 


j(x) := k(Az, 2x), Oj(a) = Ok(Az, x) 0 (A, Ix) 


deserves some interest and illustrates what follows. More explicitly, in such 
a case one has 


K :={(w,2,2',w',r): (2',w’) € Ok(w,x), r=k(w,x)}. 


This case is considered later on. Let us note that when K is the subjet of 
such a function & and when 0 satisfies condition (C) the set J contains the 
subjet of j. But one may have J 4 J?j when j = ko (A,Ix). For j of this 
form, a natural perturbation of j is given by p(w,x) := k(w + Az,«x) for 
(w,x) € W x X. Such a perturbation may inspire a hyperperturbation P in 
the general case to which we return. 

Given A, K, and J as in (10.2), we can introduce P by setting 


P:={(w,az,u' + Atw’,w’,r):u' € X', w EW’, (Art+u,z,u',w’,r) € K} 
={(uw,z,2',w',r): (Ac+u,z,2' — A'w’,w’,r) € K}. 
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Then J is the domain of the slice Pp: X x X' x R= W’ of P given by 
Po(z,2",r) := {w' € W’: (Ow, 2,2’, w',r) € Ph, 


so that P is a hyperperturbation of J. The Ekeland transform P’ := E(P) C 
X' x W' x W x X x R of P is given by 


P':={(u' + Atu’,w’,w, 2, (w’,w) + (uw + AT’, 2) 1): 
u € X',w' EW’, (Art wu,z,u’',w’,r) € K} 
= {(a',w',w,2,7’): (Actwu,az,2’— Atw’,w’, (w',w) + (2’, 2) —1’) € K}, 


and the domain J’ of the slice Pj: W' x W x R= X of P’ defined by 


P3(w',w,r’) := {x € X: (Ox, w',w, 2,7’) € P’} 


J’ ={(w',w,r’): dee X, (Art+wu,2,—Atw’,w’, (w’,w) — 17’) € K} 


and the adjoint problem is 


(P’) find (w’,r’) € W’ x R such that dr € X, (Az, xz, —A™w’,w'’,—-1r’) € K. 


Equivalently, because (w’, Ax) + (—ATw’, x) = 0, we have 


(P’) find (w’,r’) € W’xR such that da € X, (—ATw’,w’, Ax,az,r') € E(K). 


Thus, (P’) is obtained from E(A’) in a way similar to the one (P) is de- 
duced from K, with —AT, X’, W’, X, W substituted to A, W, X, W’, X’, 
respectively. When A is continuous, k is a generalized Legendre function, and 
K :=4(J°k) for some subdifferential 0, one has 


(w',u’) € Ok(Ax + w, 2) & (Ax + w, 2) € Ok” (ul, w’) 


so that P’ is obtained from K’ := ¥(J°k”) as P is obtained from K := J°k 
where 9’ is a transposition similar to 7. Then (P’) is a substitute for the 
extremization of the function j’: w’ + k”(—ATw’,w’). Under appropriate 
assumptions, the preceding guideline becomes a precise result. 


Lemma 10.1. Given a function k: W x X — Rx finite at (W,t) © W x X 
and a continuous linear map A: X — W, let f: W x X > Roo be given by 
f(w,x) := k(Ax + w,x). Then, for any subdifferential satisfying condition 
(D), one has 


(w,2,w',u’ + ATw',r) € JOf o (Ar+u,a,w’',u’,r) € JPk, 


so that P is the subjet of f up to a transposition. 
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Proof. The result amounts to 
(w’,u’ + ATw’) € Of (w, xz) > (w’,u’) € Ok(Ax + wu, 2). 


It stems from condition (D), because the map B: (w,x) > (Ax +w,2) is an 
isomorphism with inverse (z,2) > (z — Aw,x), as a simple computation of 
the transpose BT of B shows. 


Proposition 10.4. Let W and X be reflexive Banach spaces with dual spaces 
W’ and X', respectively, and let A: X — W be linear and continuous. Let 
k: W x X — Ro be a generalized Legendre function and let K := J°k 
be its subjet. Then, for any subdifferential satisfying condition (D), the ea- 
tremization problem (P’) is the extremization problem associated with the 
hyperperturbation P! = J°p! of J’, where 


p(x’, w’) = k¥(w', a! _ ATw’) (x’, w') é Xx’ x WwW’. 
Proof. Using the preceding lemma with a change of notation, we have 


(x, z— Ax) € Op' (2',w’) © (x,z) € Ok’ (2! — ATw',w’) 
<> (2’ — ATw',w’) € Ok(a, z). 


Then, the definition of P’ given above gives the result. 


Let us observe that when k is convex one gets the generalized Fenchel— 
Rockafellar duality (see for instance [47, Corollary 2.8.2]): 


: _ __ p* AT / 
inf k(Az, x) max, ( k*(w', —Atw’)). 


Proposition 10.5. Let W and X be reflexive Banach spaces with dual spaces 
W’ and X', respectively, and let A: X — W be linear and continuous. Let 
k:Wx X —R be al.s.c. proper convex function such that 


R, (dom k* — (I, -A™)(W’)) (10.3) 


is a closed vector subspace of W' x X'. Then the extremization problem (P’) 
coincides with the minimization problem 


minimize k*(w', —A™w’) w ew’. 


Proof. When k is a l.s.c. proper convex function, it is a generalized Legendre 
function and k” = k*, the Fenchel transform of k. Moreover, under the 
qualification condition (10.3), the Attouch—Brézis theorem ensures that for 
the convex function j’: w! + k”(w', —ATw’) one has 


07" (w’) = {w — Ax : (w,x) € Ok*(w', —ATw')} 
= {w— Az: (w’,-At™w’) € Ok(w, z)}. 
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The next result deals with the particular case in which k(w, 7) = g(w—b)+ 
f(a) for (w,x) € W x X, where f: X > Rua, g: W > Ro are l.s.c. proper 
convex functions and b € W is fixed. It follows from an easy computation of 
k*: k*(w',x’) = g*(w’) + (w’,b) + f*(a’). Then one obtains that condition 
(10.3) is satisfied if and only if R;(dom f* + A™(W’)) is a closed vector 
subspace of X’. 


Corollary 10.1. Let W and X be reflexive Banach spaces with dual spaces 
W’ and X’, respectively, let A: X — W be linear and continuous, and let 
f:X Ro, g: W - Ro be l.s.c. proper convex functions such that 


R,(dom f* + A™(W’)) (10.4) 


is a closed vector subspace of W'. Then the extremization problem (P') coin- 
cides with the minimization problem 


minimize f*(—ATw’) + g*(w’) + (w’, b) w ew’. 


Let us note that when R,(domk — k o (A,Ix)(X)) is a closed vector 
subspace of W x X, the set J is the subjet of the function j, so that the 
situation is entirely symmetric. However, such a condition is not required to 
apply the duality relationships described in the preceding results. 


10.7 The Toland Duality 


In [15] Ekeland applies his duality scheme to the case of the Toland duality. 
The primal problem is 


(T) ext f(z) —g(Az) «rE X, 


where g: W — Rand f: X — Rw are l.s.c. proper convex functions and 
A: X — W is a continuous linear map. We interpret it as the extremization 
of the set 


J :={(z,2',r) € Xx X'xKR: Jw’ € Og(Az), Ju’ € Of (x), v! =u'—ATw'}. 


However, we do not claim that J is the subjet of 7: > f(x) —g(Az). Thus, 
instead of using the subjet of k: (w,x) +> f(x) — g(w), we introduce the sets 


K :={(w,2,2',w',r): —w!’ € Og(w), 2’ € Of(x), r= f(x) — g(w)}. 
Now we set 


P:={(u,2,2',w',r):w' € Og(Ax—w), du’ € Of(z), 2’ =u' — ATw'} 
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which can be thought of as a similar interpretation of the subjet of p: (w,x) > 
f(a) — g(Ax — w). Moreover, 


P:={(w,az,u' + Atw’,w’,r):u' € X', w EW’, (Ar —w,2,u',u’,r) € K} 
= {(w,z,v',w',r): (Av—w,2,2' — ATw’,w’,r) € K}. 


Then J is the domain of the slice Py: X x X' x R= W’ of P given by 
Po(a,2',r) := {w' © W': (Ow, 2, 2',w',r) € Ph, 


so that P is a hyperperturbation of J. The Ekeland transformed set P’ := 
E(P) CX’ x W' x W x X x R of P is given by 


Pea 1 a er) 2 
(Ax — w,a,v' — ATu’,w’, (w’,w) + (a’, 2) — r’) € K} 
= {(u' + ATw’,w’,w, 2, (w’,w) + (u’ + ATw', 2) — 1): 
(Ar —w,2,u’',w’,r) € K} 


and the domain J’ of the slice Pj: W’ x W x R= X of P’ defined by 


Po(w',w, 7’) = {x EX: (Ox, w’,w, 2,7’) € P"} 


J’ = {(w',u,r’) : av € X, (Ar —w,2,—-Atu’,w’, (w’,w) — 17’) € K}. 


Thus the adjoint problem is 


(P’) find (w’,r’) € W’ x R such that dr € X, (Az, 2,-ATw',w’,—-r’) € K. 
We observe that, because (a! — ATw’, a) + (w’, Ax — w) = (w’, —w) + (2’, 2), 
by the Fenchel equality 
(w',w) + (2', x) — f(x) + g(Ax + w) 
= (a' — Atw',x) — (-w', Ax + w) — f(z) + g(Az + w) 
= gh(w!) — fr(ATu! — 2"). 
Introducing the set K’ := E(k), 
K'={(2',w',w, 2,7’): (—w',2’) € Og(w) x Of(z), 
r’ = ((w’,x'),(w,x)) — f(z) + g(w)} 
= {(a',w',w,2,7r’): we Og*(-w’), x € OF*(z’), 
r= f*(a') — g*(-w')} 


which corresponds to the subjet of (2’, w’) + f*(«’) — g*(—w’), we have 
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P!={(a',w',w,2,7’): (2’ — ATw’,w',2,w,r’) € KY 
= {(2',w',w, 2, f*(a’ — ATw’) — g*(-w")): 
Az —w € Og*(—u"), « € Of*(a’ — ATw')}, 
J’ :={(w',w,r’): da € X, Ax —w € Og*(—-w’), x € OF*(—A™w’), 
r= f*(—Atw') — g*(-w')} 


= {(w',w,r’) : du € 09(ATw’), w + Au € Of (w’), r’ = f(w’) — G(ATw')}, 


where f(w’) := g*(—w’), 9(2’) = f*(-2’). 

Therefore, replacing f. g, A by g*, f*, AT, and using a construction similar 
to the one we have used to pass from 7 to J, the adjoint problem can be 
interpreted as 


(7") ext f(w’) -—g(Atu’) w' EW’. 


This is the Toland duality. Note that if we use a subdifferential O such that 
O(—g)(x) = —Og(x) for a convex function g, and if we dispose of regularity 
assumptions ensuring a sum rule, the preceding constructions are no more 
formal. 


10.8 The Wolfe Duality 


Let us give a version of the Wolfe duality [46, 12, 34, 35, 36] that involves 
a family of minimization problems rather than a single one; we show that it 
can be interpreted as an instance of the Ekeland duality. 

Given a set U, n.v.s. W, X, a closed convex cone C' in W, and mappings 
f:Ux X —R and g: U x X — W which are differentiable in their second 
variable, let us consider the constrained optimization problem 


(M) minimize f(u,x) under the constraint g(u,2) € C. 


We consider (M) as a minimization problem with respect to a primary vari- 
able x and a second variable u or as a family of partial minimization problems 


(M,,) minimize f,,(x) under the constraint g,,(x) € C, ue U. 


The variant of the Wolfe dual we deal with is the family of partial maximiza- 
tion problems indexed by u € U, 


0 
(W,,) maximize ¢,,(x, y) over (x,y) € X xY subject to Bp eule y) =0, ye CO", 


where €,,(x, y) := fu(x) + (y, gu(x)) is the classical Lagrangian, Y is the dual 
of W, and C® := {ye Y :Ywe C (y,w) < 0}. We observe that in (W,,) the 


370 Jean-Paul Penot 


implicit constraint g(u, xz) € C which is difficult to deal with has disappeared, 
and an easier equality constraint appears. 

Then one has the following result, whose proof is similar to the one in [12, 
Theorem 4.7.1]. 


Theorem 10.2. Suppose that for allu € U and ally € —C® the functions fy, 
and y° gy are conver. Then, for allu € U one has the weak duality relation 


sup(W,,) < inf(M.,,). 


If (M) has a solution, then there exists some u € U such that strong duality 
holds; that is, the preceding inequality is an equality. 


In order to relate this result to the Ekeland scheme, for u € U we introduce 
the subset 


Ju = (ane so 2) i fulz), Gu(z) € C, ci = Juke) +yog,,(2), 
y’ = Gu(x), (Y,Gu(x)) = OF 
of X x C° x X’x W xR, so that J, is the intersection of {(x,y,2’,y',r) € 
Xx C°x X'xWxXR: (y,9,(x)) = 0} with the one-jet 
Jt, = {(a,y,2',y',1) : (2’,y') = DEAay), CS Lule.) } 


of the function ¢,,. The extremization of J, consists in searching pairs (a, y) € 
X x C® which are critical points of £,, with respect to X x C®°, that is, which 
satisfy 


a a ee a 
7g a ley) 0, Hy t9) EC CL OY, By u(x, y)) =0 


This is exactly the set of solutions of the Kuhn—Tucker system. 
It is natural to associate with (M,,) the perturbed problem by w © W 


(Mu,w) minimize f,,(@) under the constraint g,,(z) + w € C. 


We associate with this problem the subset P of the set W x g,!(C) x C® x 

X' x Y’ x W' x R given by 

(w, x, Y; es i. w’,r) € P o 

a’ = Dfu(x)+y° Dgu(x), y! = gula)+u, w' =y, r= fu(@)+(y, gu(x) +). 
It is clearly a hyperperturbation of J,,. A short computation shows that 


its Ekeland transform P’ is characterized by (w’, x’, y’,w,x,y,r’) € P’ if and 
only if (w,z,y,2',y',w',r’) € W x g-1(C) x C° x X' x Y' x W' x R and 


/ 


r = (w',w) +(x, 2)—fu(z), w'=y, x’ = Df,(x)+yoDgu(z), y= 9u(z)+w. 
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Thus, considering w as a parameter and (x,y) as the decision variable, we 
can set 


Ji, = {(w', w,r’) : A(z, y) € Gu (C) x eu (w’, Ox, Oy, w, 2, y,7') € Ph. 


We obtain that (w’, w,r’) € Ji, if and only if there exists (x,y) € g,!(C) x C® 
such that y := w’, 
Dfu(x) + yo Dgu(x) = 0, Guz) +wec, 
(Y; Gu(2) + w) = 0, r= (w’, w) > fu(z). 


Then r! = (y, —gu(a)) — fu(@) = —€u(a, y)- 
We see that a. ) corresponds to the search of (w’,r’,z) €Y xRx X 
such that 


Dfu(x) + y ° Dgu(x) = 0, Jul) € C, ye C®, 
(w', gu(2)) =0, r= —£,(2, y); 


or, in other terms, to the search of (z,y,r’) € g,‘(C) x C® x R such that 
dl,,(x, y)/Ox = 0, Ol, (x, y)/Oy € C, r! = —Ly (x,y): 


(y,r’) € ext(M',) & da € X : g(x) € C, 
(y,gu(x)) =0, Dfu(z)+yoDgu(z)=0, rr =—by(z,y). 


Now (a, y) is a critical point for the problem 


(M/,) maximize ¢,,(x,y) over (x,y) € X x Y 


0 
under the constraints g(x) € C, By lultsy) =0 


if and only if there exist multipliers 7 € C°, z** € X** such that for all 
(Z,y)EXxY, 


(J; gu(x)) = 0, 
—DEu(, y)(@, 9) + 9, Dou) (@)) + DE tals, y)(@,y)) = 0. 


Taking 7 = 0, Z** = 0, we see that for any solution (y,r’) of ext(M/,) and 
for any « € X satisfying the requirements of ext(M/,), one gets a critical 
point (x,y) of the problem (M/,). In turn, considering (u, 7) as an auxiliary 
variable and y as the primary variable, one is led to the maximization problem 
(W,,). However, a solution (x,y) of (W) should satisfy the extra conditions 
gu(x) € C, (y, gu(x)) = 0 in order to yield a solution to ext(M/,). 

Note that in the case of the quadratic problem 


f 
(Q) minimize 3 (2; x) + (q, 2) subject to Ar —bE C, 
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where Q: X — X’ is linear, continuous, and symmetric (but not necessarily 
semidefinite positive), A: X > W, q € X’, b € W, C being a closed convex 
cone of W, the Wolfe dual 


1 
(W) maximize 3 \@z, 2) + (q,@) + (y, Ax — 6) over (x,y) EX xX Y 
subject to Qx+q+yoA=0 


is a simple quadratic problem with linear constraints. It can be given neces- 
sary and sufficient optimality conditions provided the map (x,y) +> Qxz+yoA 
has a closed range in X’. 


10.9 The Clarke Duality 


Let X be a reflexive Banach space, let A: X — X™* be a densely defined self- 
adjoint operator (i.e., such that (Av1, 72) = (#1, Arg) for any 71,72 € dom A) 
and let g: X > RU {+00} be al s.c. proper convex function. Let X’ := X* 
and let J be given by 


J := {(2,0',r)€ X x X'x R: a! + Av € Og(z), r = j(x)} 
where 
1 
j(x) = g(a) — 3 (Ae, 2) for « € dom ANdomg, j(x) = +co else. 


Let us consider the extremization problem of J: 
(P) find (z,r) € X x R such that Az € Og(x), r = j(2). 


Here we have taken —A instead of A as in [7, 16] and elsewhere in order to 
get a more symmetric form of the result; of course, this choice is inessential as 
we make no positiveness assumption on A. When A is continuous, and when 
the subdifferential 0 satisfies condition (T) (in particular for the Fréchet, the 
Hadamard, the moderate, and the Clarke subdifferentials) J is the subjet of 
j because in that case one has 


x’ € Oj(x) & 2 + Az € Og(z). 


In particular, x is a critical point of j in the sense 0 € 07(x) iff Ax € Og(zx). 
Then (P) corresponds to the extremization of j. 

Let us introduce a hyperperturbation of J by setting W := X*, W':= X, 
X’' := X*, and 


P :={(w,2,2',x,j(x)) © Wx domj x X’x W' xR: a + Ar—w€ dg(az)}. 
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In fact, we have 


Po(2,2',r) := {w' © W': (Ow, 2,2’, w’,r) € P} 
={w €W':w' =a, 2+ Az € Og(z), r= j(z)}, 


hence 
(xz, 2',r)€ dom Py & x! + Ax € Og(x), r= j(x) & (2,2',r) € J, 


so that P is indeed a hyperperturbation of J in the sense given above. Al- 
though we do not need the following result to proceed, it may serve as a guide 
line. 


Lemma 10.2. When A is continuous and O satisfies conditions (F), (P), 
(T), the set P is the subjet of the function f: W x X > Ro given by 


fw.) = g(a) — 5 (Ax, 2) + (w,2) 


and f is an Ekeland function. 


Proof. When A is continuous f is the sum of the continuously differentiable 
function (w,x) + —5(Az,x)+(w, 2x) with the convex function (w, x) + g(z), 
and conditions (T), (P), and (F) ensure that 

(w’,2') € Of(w,z) ew’ =a, 2! + Axr—w € Oq(z2). (10.5) 


Then, for (w’, x’) € W’ x X’ and for (w,x) € (Of)~' (w’, x’) one has 
ful 2!) = (wy!) + (22!) ~ (gle) — 542) + (w, 2) 
= (w', a’) + 5(Atw!, 1" — g(w’) 


and we see that this value does not depend on the choice of (w,x) € 
(Of)~' (w’, a’): f is an Ekeland function. 


Let us return to the general case. In order to describe the dual problem 
(P’), we observe that 


J’ ={(w',w,r)eW'xWwxR: 
={(w',w,r')Ew’xwWxR: 


wa 


xe X, (Ox,w',w,2,7’) € P’} 
xe X, (w,x,0x7,w'’, (w,w’) —r’) € P} 


and 
x € Pi(w’,Ow,r’) & (Ow, 2, 0x, w’,—-r’) € P 


so that (w’,Ow,r’) € J’ = dom P} iff there exists some x € domj C X such 
that Ax € Og(x), « = wu’, r’ = —f(0,x). Thus, because g is convex and A is 
symmetric, 
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(w’,r’) € ext J’ Sw’ €domj, Aw’ € Og(w’), r’ = —f(0, w’) 
= w' €domj, w' € 0g*(Aw’), 1! = —j(w’) 
=>w' €domj, Aw’ € A(dg*(Aw’)) C O(g* 0 A) (w’), 1’ = -j(w’). 


In particular, when 0 satisfies conditions (F) and (T) and A is continuous, 
for any (w’,r’) € ext J’, the pair (w’,—r’) is a critical pair of the function 
gj’: X — RU {+00} given by 


i! (2) := 9"(Ae) ~ 5 (Ar, 2). 


This function is invariant by addition of an element of Ker A, thus we have 
obtained under these conditions the first part of the following statement which 
subsumes Clarke duality. In order to prove the second part we introduce the 
function j” given by 


j"(2) == (g" 0 A)" (An) — 5(As, 2). 


Theorem 10.3. Suppose g is l.s.c. proper convex, O satisfies (F), (T), and 
A is continuous. Then, 

(a) For any critical pair (x,r) of J and for any u € Ker A, the pair (a% + 
u, —r) is a critical pair of J’. 

(b) For any critical pair (a’,r’) of J’ and for any u € Ker A, the pair 
(a’ + u,—r’) ts a critical pair of 7". If moreover g is convex and 


Ry (domg* — A(X)) = X’, 


then there exists u’ € Ker A such that (a + u,—r') is a critical pair of 9. 


Proof. Because J’ has the same form as J, with g replaced by g*o A, we obtain 
from part (a) that for any critical pair (x’,r’) of 7’ and for any u € Ker A, 
the pair (x’ + u,—r’) is a critical pair of 


a 1 
r+ (go A)" (Ag) — 5(Aa, x) = j"(z). 
On the other hand, x’ is a critical point of 7’ means that 
Aa’ € O(g* 0 A)(a’). 


Now, under condition (C), the Attouch—Brézis theorem ensures the equalities 
O(g* 0 A) (a’') = AT(0g*(Ax’)) = A(Og*(Az')), so that there exists some 
y’ € 0g*(Az’) such that 

Aa’ = Ay’. 


Thus, one has wu’ := y’ — 2’ © KerA and because y’ € Og*(Az’), by the 
reciprocity formula, we get Ax’ € Og(y’) or Ay’ € Og(y’). Therefore, (a’ + 
u, —r’) is a critical pair of 7. Oo 
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Chapter 11 


Global Optimization in Practice: 
State of the Art and Perspectives 


Janos D. Pintér 


Summary. Global optimization—the theory and methods of finding the best 
possible solution in multiextremal models—has become a subject of interest 
in recent decades. Key theoretical results and basic algorithmic approaches 
have been followed by software implementations that are now used to handle 
a growing range of applications. This work discusses some practical aspects 
of global optimization. Within this framework, we highlight viable solution 
approaches, modeling environments, software implementations, numerical ex- 
amples, and real-world applications. 


Key words: Nonlinear systems analysis and management, global optimiza- 
tion strategies, modeling environments and global solver implementations, 
numerical examples, current applications and future perspectives 


11.1 Introduction 


Nonlinearity plays a fundamental role in the development of natural and 
man-made objects, formations, and processes. Consequently, nonlinear de- 
scriptive models are of key relevance across the range of quantitative sci- 
entific studies. For related discussions that illustrate this point consult, for 
instance, Bracken and McCormick (1968), Rich (1973), Mandelbrot (1983), 
Murray (1983), Casti (1990), Hansen and Jorgensen (1991), Schroeder (1991), 
Bazaraa et al. (1993), Stewart (1995), Grossmann (1996), Pardalos et al. 
(1996), Pintér (1996a, 2006a, 2009), Aris (1999), Bertsekas (1999), Corliss 
and Kearfott (1999), Floudas et al. (1999), Gershenfeld (1999), Papalam- 
bros and Wilde (2000), Chong and Zak (2001), Edgar et al. (2001), Gao et 
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al. (2001), Jacob (2001), Pardalos and Resende (2002), Schittkowski (2002), 
Tawarmalani and Sahinidis (2002), Wolfram (2002), Diwekar (2003), Sto- 
janovic (2003), Zabinsky (2003), Bornemann et al. (2004), Fritzson (2004), 
Neumaier (2004), Bartholomew-Biggs (2005), Hillier and Lieberman (2005), 
Lopez (2005), Nowak (2005), Kampas and Pintér (2009), as well as many 
other topical works. 

Decision support (control, management, or optimization) models that in- 
corporate an underlying nonlinear system description frequently have multi- 
ple—local and global—optima. The objective of global optimization (GO) is 
to find the “absolutely best” solution of nonlinear optimization models under 
such circumstances. 

We consider the general continuous global optimization (CGO) model de- 
fined by the following ingredients. 


eu decision vector, an element of the real Euclidean n-space R” 
e lu explicit, finite n-vector bounds of x that define a “box” in R” 
e f(a) continuous objective function, f: R” —~ R 
e g(x) m-vector of continuous constraint functions, g: R” > R™ 
Applying this notation, the CGO model is stated as 
min f(z) (11.1) 
xe D:={a:l<a<u g(x) < Of. (11.2) 


In (11.2) all vector inequalities are interpreted componentwise (1, x, u, 
are n-component vectors and the zero denotes an m-component vector). The 
set of the additional constraints g could be empty, thereby leading to box- 
constrained GO models. Let us also note that formally more general optimiza- 
tion models that also include = and > constraint relations and/or explicit 
lower bounds on the constraint function values can be simply reduced to the 
model form (11.1) and (11.2). 

The CGO model is very general: in fact, it evidently subsumes linear pro- 
gramming and convex nonlinear programming models, under corresponding 
additional specifications. Furthermore, CGO also subsumes (formally) the 
entire class of pure and mixed integer programming problems. To see this, 
notice that all bounded integer variables can be represented by a correspond- 
ing set of binary variables, and then every binary variable y € {0,1} can be 
equivalently represented by its continuous extension y € [0,1] and the non- 
convex constraint y(1— y) < 0. Of course, this reformulation approach may 
not be best—or even suitable—for “all” mixed integer optimization mod- 
els: however, it certainly shows the generality of the CGO model framework. 
Without going into details, note finally that models with multiple (partially 
conflicting) objectives are also often deduced to suitably parameterized col- 
lections of CGO (or simpler optimization) models: this remark also hints at 
the interchangeability of the objective f and one of the (active) constraints 
from g. 
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Let us observe next that if D is nonempty, then the above-stated basic 
analytical assumptions guarantee that the optimal solution set X* in the 
CGO model is nonempty. This result directly follows by the classical theorem 
of Weierstrass that states the existence of the global minimizer point—or, in 
general, a set of such points—of a continuous function over a nonempty, 
bounded, and closed (compact) set. 

For reasons of numerical tractability, the following additional requirements 
are also often postulated. 


e Disa full-dimensional subset (“body”) in R”. 

e The set of globally optimal solutions to (11.1) and (11.2) is at most count- 
able. 

e f and g (componentwise) are Lipschitz-continuous functions on [I, uJ. 


Without going into technical details, notice that the first of these assump- 
tions (the set D is the closure of its nonempty interior) makes algorithmic 
search easier (or at all possible) within D. The second assumption supports 
theoretical convergence results: note that in most well-posed practical GO 
problems the set of global optimizers consists either of a single point «* or 
at most of several points. The third assumption is a sufficient condition for 
estimating f* = f(x*) on the basis of a finite set of generated feasible search 
points. (Recall that the real-valued function h is Lipschitz-continuous on its 
domain of definition D C R”, if |h(a1) — h(x2)| < L||a1 — x2] holds for all 
pairs 2; € D, x2 € D; here L = L(D,h) is a suitable Lipschitz-constant 
of h on the set D.) We emphasize that the exact knowledge of the small- 
est suitable Lipschitz-constant for each model function is not required, and 
in practice such information is typically unavailable. At the same time, all 
models defined by continuously differentiable functions f and g belong to the 
CGO or even to the Lipschitz model-class. 

The notes presented above imply that the CGO model-class covers a very 
broad range of optimization problems. As a consequence of this generality, 
it includes also many model instances that are difficult to solve numerically. 
For illustration, a merely one-dimensional, box-constrained GO model based 
on the formulation (11.3) is shown in Figure 11.1. 


mincos(x)sin(a?-2) O<2#< 10. (11.3) 


Model complexity often increases dramatically (in fact, it can grow ex- 
ponentially) as the model size expressed by the number of variables and 
constraints grows. To illustrate this point, Figure 11.2 shows the objective 
function in the model (11.4) that is simply generalized from (11.3) as 


min cos(z) sin(y? — 2) +cos(y) sin(a?—y) O<2<10,0<y< 10. (11.4) 


The presented two (low-dimensional, and only box-constrained) models 
already indicate that GO models—for instance, further extensions of model 
(11.3), perhaps with added complicated nonlinear constraints—could become 
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Fig. 11.1 The objective function in model (11.3). 


Fig. 11.2 The objective function in model (11.4). 
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truly difficult to handle numerically. One should also point out here that a 
direct analytical solution approach is viable only in very special cases, because 
in general (under further structural assumptions) one should investigate all 
Kuhn—Tucker points (minimizers, maximizers, and saddle points) of the CGO 
model. (Think of carrying out this analysis for the model depicted in Figure 
11.2, or for its 100-dimensional extension.) 

Arguably, not all GO models are as difficult as indicated by Figures 11.1 
and 11.2. At the same time, we typically do not have the possibility to directly 
inspect, visualize, or estimate the overall numerical difficulty of a complicated 
nonlinear (global) optimization model. A practically important case is when 
one needs to optimize the parameters of a model that has been developed by 
someone else. The model may be confidential, or just visibly complex; it could 
even be presented to the optimization engine as a compiled (object, library, 
or similar) software module. In such situations, direct model inspection and 
structure verification are not possible. In other practically relevant cases, the 
evaluation of the optimization model functions may require the numerical 
solution of a system of embedded differential and/or algebraic equations, the 
evaluation of special functions, integrals, the execution of other deterministic 
computational procedures or stochastic simulation modules, and so on. 

Traditional nonlinear optimization methods (discussed in most topical 
textbooks such as Bazaraa et al., 1993, Bertsekas, 1999, Chong and Zak, 
2001, and Hillier and Lieberman, 2005) search only for local optima. This 
generally followed approach is based on the tacit assumption that a “suff- 
ciently good” initial solution (that is located in the region of attraction of the 
“true” global solution) is available. Figures 11.1 and 11.2 and the practical 
situations mentioned above suggest that this may not always be a realistic 
assumption. Nonlinear models with less “dramatic” difficulty, but in (perhaps 
much) higher dimensions may also require global optimization. For instance, 
in advanced engineering design, optimization models with hundreds, thou- 
sands, or more variables and constraints are analyzed and need to be solved. 
In similar cases, even an approximately completed, but genuinely global scope 
search strategy may (and typically will) yield better results than the most 
sophisticated local search approach “started from the wrong valley”. This 
fact has motivated research to develop practical GO strategies. 


11.2 Global Optimization Strategies 


As of today, well over a hundred textbooks and an increasing number of Web 
sites are devoted (partly or completely) to global optimization. Added to 
this massive amount of information is a very substantial body of literature 
on combinatorial optimization (CO), the latter being, at least in theory, a 
“subset of GO.” The most important global optimization model types and 
(mostly exact, but also several prominent heuristic) solution approaches are 
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discussed in detail by the Handbook of Global Optimization volumes, edited 
by Horst and Pardalos (1995), and by Pardalos and Romeijn (2002). We also 
refer to the topical Web site of Neumaier (2006), with numerous links to other 
useful information sources. The concise review of GO strategies presented 
here draws on these sources, as well as on the more detailed expositions in 
Pintér (2001a, 2002b). Let us point out that some of the methods listed 
below are more often used in solving CGO models, whereas others have been 
mostly applied so far to handle CO models. Because CGO formally includes 
CO, it should not be surprising that approaches suitable for certain specific 
CO model-classes can (or could) be put to good use to solve CGO models. 

Instead of a more detailed (but still not unambiguous) classification, here 
we simply classify GO methods into two primary categories: exact and heuris- 
tic. Exact methods possess theoretically established (deterministic or sto- 
chastic) global convergence properties. That is, if such a method could be 
carried out completely as an infinite iterative process, then the generated 
limit point(s) would belong to the set of global solutions X*. (For a single 
global solution «*, this would be the only limit point.) In the case of stochas- 
tic GO methods, the above statement is valid “only” with probability one. In 
practice—after a finite number of algorithmic search steps—one can only ex- 
pect a numerically validated or estimated (deterministic or stochastic) lower 
bound for the global optimum value z* = f(a*), as well as a best feasible 
or near-feasible global solution estimate. We emphasize that to produce such 
estimates is not a trivial task, even for implementations of theoretically well- 
established algorithms. As a cautionary note, one can conjecture that there is 
no GO method, and never will be one, that can solve “all” CGO models with 
a certain number of variables to an arbitrarily given precision (in terms of the 
argument «*), within a given time frame, or within a preset model function 
evaluation count. To support this statement, please recall Figures 11.1 and 
11.2: both of the objective functions displayed could be made arbitrarily more 
difficult, simply by changing the frequencies and amplitudes of the embedded 
trigonometric terms. We do not attempt to display such “monster” functions, 
because even the best visualization software will soon become inadequate: 
think for instance of a function such as 1000cos(1000z) sin(1000(x? — «)). 
For a more practically motivated example, one can also think of solving a 
difficult system of nonlinear equations: here, after a prefixed finite number of 
model function evaluations, we may not have an “acceptable” approximate 
numerical solution. 

Heuristic methods do not possess similar convergence guarantees to those 
of exact methods. At the same time, they may provide good quality solu- 
tions in many difficult GO problems, assuming that the method in question 
suits the specific model type (structure) solved. Here a different caution- 
ary note is in order. Because such methods are often based on some generic 
metaheuristics, overly optimistic claims regarding the “universal” efficiency 
of their implementations are often not supported by results in solving truly 
difficult, especially nonlinearly constrained, GO models. In addition, heuris- 
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tic metastrategies are often more difficult to adjust to new model types than 
some of the solver implementations based on exact algorithms. Exact sto- 
chastic methods based on direct sampling are a good example for the latter 
category, because these can be applied to “all” GO models directly, without 
a need for essential code adjustments and tuning. This is in contrast, for ex- 
ample, to most population-based search methods in which the actual steps 
of generating new trial solutions may depend significantly on the structure 
of the model-instance solved. 


11.2.1 Exact Methods 


“Naive” approaches (grid search, pure random search): these are obviously 

convergent, but in general “hopeless” as the problem size grows. 

e Branch-and-bound methods: these include interval-arithmetic-based strate- 
gies, as well as customized approaches for Lipschitz global optimization 
and for certain classes of difference of convex functions (D.C.) models. 
Such methods can also be applied to constraint satisfaction problems and 
to (general) pure and mixed integer programming. 

e Homotopy (path following, deformation, continuation, trajectory, and re- 
lated other) methods: these are aimed at finding the set of global solutions 
in smooth GO models. 

e Implicit enumeration techniques: examples are vertex enumeration in con- 
cave minimization models, and generic dynamic programming in the con- 
text of combinatorial optimization. 

e Stochastically convergent sequential sampling methods: these include adap- 

tive random searches, single- and multistart methods, Bayesian search 

strategies, and their combinations. 


For detailed expositions related to deterministic GO techniques in addition 
to the Handbooks mentioned earlier, consult, for example, Horst and Tuy 
(1996), Kearfott (1996), Pintér (1996a), Tawarmalani and Sahinidis (2002), 
Neumaier (2004), and Nowak (2005). On stochastic GO strategies, consult, for 
example, Zhigljavsky (1991), Boender and Romeijn (1995), Pintér (1996a), 
and Zabinsky (2003). 


11.2.2 Heuristic Methods 


e Ant colony optimization is based on individual search steps and “ant-like” 
interaction (communication) between search agents. 

e Basin-hopping strategies are based on a sequence of perturbed local 
searches, in an effort to find improving optima. 
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e Convex underestimation attempts are based on a limited sampling effort 
that is used to estimate a postulated (approximate) convex objective func- 
tion model. 

e Evolutionary search methods model the behavioral linkage among the 
adaptively changing set of candidate solutions (“parents” and their “chil- 
dren,” in a sequence of “generations” ). 

e Genetic algorithms emulate specific genetic operations (selection, crossover, 
mutation) as these are observed in nature, similarly to evolutionary meth- 
ods. 

e Greedy adaptive search strategies (a metaheuristics often used in combi- 
natorial optimization) construct “quick and promising” initial solutions 
which are then refined by a suitable local optimization procedure. 

e Memetic algorithms are inspired by analogies to cultural (as opposed to 
natural) evolution. 

e Neural networks are based on a model of the parallel architecture of the 
brain. 

e Response surface methods (directed sampling techniques) are often used 
in handling expensive “black box” optimization models by postulating and 
then gradually adapting a surrogate function model. 

e Scatter search is similar in its algorithmic structure to ant colony, genetic, 
and evolutionary searches, but without their “biological inspiration.” 

e Simulated annealing methods are based on the analogy of cooling crystal 
structures that will attain a (low-energy level, stable) physical equilibrium 
state. 

e Tabu search forbids or penalizes search moves which take the solution 
in the next few iterations to points in the solution space that have been 
previously visited. (Tabu search as outlined here has been typically applied 
in the context of combinatorial optimization.) 

e Tunneling strategies, filled function methods, and other similar methods 
attempt to sequentially find an improving sequence of local optima, by 
gradually modifying the objective function to escape from the solutions 
found. 


In addition to the earlier mentioned topical GO books, we refer here to 
several works that discuss mostly combinatorial (but also some continuous) 
global optimization models and heuristic strategies. For detailed discussions 
of theory and applications, consult, for example, Michalewicz (1996), Os- 
man and Kelly (1996), Glover and Laguna (1997), Voss et al. (1999), Jacob 
(2001), Ferreira (2002), Rothlauf (2002), and Jones and Pevzner (2004). It is 
worth pointing out that Rudolph (1997) discusses the typically missing theo- 
retical foundations for evolutionary algorithms, including stochastic conver- 
gence studies. (The underlying key convergence results for adaptive stochastic 
search methods are discussed also in Pintér (1996a).) The topical chapters in 
Pardalos and Resende (2002) also offer expositions related to both exact and 
heuristic GO approaches. 
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To conclude this very concise review, let us emphasize again that numerical 
GO can be tremendously difficult. Therefore it can be good practice to try 
several—perhaps even radically different—search approaches to tackle GO 
models, whenever this is possible. To do this, one needs ready-to-use model 
development and optimization software tools. 


11.3 Nonlinear Optimization in Modeling Environments 


Advances in modeling techniques, solver engine implementations and com- 
puter technology have led to a rapidly growing interest in modeling environ- 
ments. For detailed discussions consult, for example, the topical Annals of 
Operations Research volumes edited by Maros and Mitra (1995), Maros et 
al. (1997), Vladimirou et al. (2000), Coullard et al. (2001), as well as the vol- 
umes edited by Voss and Woodruff (2002) and by Kallrath (2004). Additional 
useful information is provided by the Web sites of Fourer (2006), Mittelmann 
(2006), and Neumaier (2006), with numerous further links. Prominent ex- 
amples of widely used modeling systems that are focused on optimization 
include AIMMS (Paragon Decision Technology , 2006), AMPL (Fourer et al., 
1993), the Excel Premium Solver Platform (Frontline Systems , 2006), GAMS 
(Brooke et al., 1988), ILOG (2004), the LINDO Solver Suite (LINDO Sys- 
tems, 2006), MPL (Maximal Software, 2006), and TOMLAB (2006). (Please 
note that the literature references cited may not always reflect the current 
status of the modeling systems discussed here: for the latest information, 
contact the developers and/or visit their Web sites.) 

There also exist a large variety of core compiler platform-based solver sys- 
tems with more or less built-in model development functionality: in principle, 
such solvers can be linked to the modeling languages listed above. 

At the other end of the spectrum, there is also notable development in re- 
lation to integrated scientific and technical computing (ISTC) systems such 
as Maple (Maplesoft, 2006), Mathematica (Wolfram Research, 2006), Math- 
cad (Mathsoft, 2006), and MATLAB (The MathWorks, 2006). From among 
the many hundreds of books discussing ISTC systems, we mention here as 
examples the works of Birkeland (1997), Bhatti (2000), Parlar (2000), Wright 
(2002), Wilson et al. (2003), Moler (2004), Wolfram (2003), Trott (2004), and 
Lopez (2005). The ISTC systems offer a growing range of optimization-related 
features, either as built-in functions or as add-on products. 

The modeling environments listed above are aimed at meeting the needs of 
different types of users. User categories include educational (instructors and 
students); research scientists, engineers, consultants, and other practitioners 
(possibly, but not necessarily equipped with an in-depth optimization-rela- 
ted background); optimization experts, software application developers, and 
other “power users.” (Observe that the user categories listed are not necessar- 
ily disjoint.) The pros and cons of the individual software products—in terms 


386 Janos D. Pintér 


of their hardware and software demands, ease of usage, model prototyping 
options, detailed code development and maintenance features, optimization 
model checking and processing tools, availability of solver options and other 
auxiliary tools, program execution speed, overall level of system integration, 
quality of related documentation and support, customization options, and 
communication with end users—make the corresponding modeling and solver 
approaches more or less attractive for the various user groups. 

Given the almost overwhelming amount of topical information, in short, 
which are the currently available platform and solver engine choices for the 
GO researcher or practitioner? The more than a decade-old software review 
(Pintér, 1996b; also available at the Web site of Mittelmann, 2006) listed a few 
dozen individual software products, including several Web sites with further 
software collections. Neumaier’s (2006) Web page currently lists more than 
100 software development projects. Both of these Web sites include general- 
purpose solvers, as well as application-specific products. (It is noted that 
quite a few of the links in these software listings are now obsolete, or have 
been changed.) 

The user’s preference obviously depends on many factors. A key question 
is whether one prefers to use “free” (noncommercial, research, or even open 
source) code, or looks for a “ready-to-use” professionally supported commer- 
cial product. There is a significant body of freely available solvers, although 
the quality of solvers and their documentation arguably varies. (Of course, 
this remark could well apply also to commercial products.) 

Instead of trying to impose personal judgment on any of the products 
mentioned in this work, the reader is encouraged to do some Web browsing 
and experimentation, as his or her time and resources allow. Both Mittel- 
mann (2006) and Neumaier (2006) provide more extensive information on 
noncommercial, as opposed to commercial, systems. Here we mention several 
software products that are part of commercial systems, typically as an add-on 
option, but in some cases as a built-in option. Needless to say, although this 
author (being also a professional software developer) may have opinions, the 
alphabetical listing presented below is strictly matter-of-fact. We list only 
currently available products that are explicitly targeted towards global op- 
timization, as advertised by the Web sites of the listed companies. For this 
reason, nonlinear (local) solvers are, as a rule, not listed here; furthermore, 
we do not list modeling environments that currently have no global solver 
options. 

AIMMS, by Paragon Decision Technology (www.aimms.com). The BARON 
and LGO global solver engines are offered with this modeling system as add- 
on options. 

Excel Premium Solver Platform (PSP), by Frontline Systems (www.solver 
.com): The developers of the PSP offer a global presolver option to be 
used with several of their local optimization engines: these currently in- 
clude LSGRG, LSSQP, and KNITRO. Frontline Systems also offers (as 
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genuine global solvers) an Interval Global Solver, an Evolutionary Solver, 
and OptQuest. 

GAMS, by the GAMS Development Corporation (www.gams.com). Cur- 
rently, BARON, DICOPT, LGO, MSNLP, OQNLP, and SBB are offered as 
solver options for global optimization. 

LINDO, by LINDO Systems (www.lindo.com). Both the LINGO modeling 
environment and What’sBest! (the company’s spreadsheet solver) have built- 
in global solver functionality. 

Maple, by Maplesoft (www-.maplesoft.com) offers the Global Optimization 
Toolbox as an add-on product. 

Mathematica, by Wolfram Research (www.wolfram.com) has a built-in 
function (called NMinimize) for numerical global optimization. In addition, 
there are several third-party GO packages that can be directly linked to Math- 
ematica: these are Global Optimization, MathOptimizer, and MathOptimizer 
Professional. 

MPL, by Maximal Software (www.maximal-usa.com). The LGO solver 
engine is offered as an add-on. 

TOMLAB, by TOMLAB Optimization AB (www.tomopt.com) is an opti- 
mization platform for solving MATLAB models. The TOMLAB global solvers 
include CGO, LGO, MINLP, and OQNLP. Note that MATLAB’s own Ge- 
netic Algorithm and Direct Search Toolboxes also have heuristic global solver 
capabilities. 

To illustrate the functionality and usage of global optimization software, 
next we review the key features of the LGO solver engine, and then apply its 
Maple platform-specific implementation in several numerical examples. 


11.4 The LGO Solver Suite and Its Implementations 


11.4.1 LGO: Key Features 


The Lipschitz Global Optimizer (LGO) solver suite has been developed and 
used for more than a decade. The top-level design of LGO is based on the 
seamless combination of theoretically convergent global and efficient local 
search strategies. Currently, LGO offers the following solver options. 


e Adaptive partition and search (branch-and-bound) based global search 
(BB) 

e Adaptive global random search (single-start) (GARS) 

e Adaptive global random search (multistart) (MS) 

e Constrained local search by the generalized reduced gradient (GRG) 
method (LS). 


In a typical LGO optimization run, the user selects one of the global (BB, 
GARS, MS) solver options; this search phase is then automatically followed 
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by the LS option. It is also possible to apply only the LS solver option, making 
use of an automatically set (default) or a user-supplied initial solution. 

The global search methodology implemented in LGO is based on the de- 
tailed exposition in Pintér (1996a), with many added numerical features. The 
well-known GRG method is discussed in numerous articles and textbooks; 
consult for instance Edgar et al. (2001). Therefore only a very brief overview 
of the LGO component algorithms is provided here. 

BB, GARS, and MS are all based on globally convergent search methods. 
Specifically, in Lipschitz-continuous models with suitable Lipschitz-constant 
(over)estimates for all model functions BB theoretically generates a sequence 
of search points that will converge to the global solution point. If there is a 
countable set of such optimal points, then a convergent search point sequence 
will be generated in association with each of these. 

In a GO model with a continuous structure (but without postulating ac- 
cess to Lipschitz information), both GARS and MS are globally convergent, 
with probability one (w.p. 1). In other words, the sequence of points that 
is associated with the generated sequence of global optimum estimates will 
converge to a point which belongs to X*, with probability one. (Again, if sev- 
eral such convergent point sequences are generated by the stochastic search 
procedure, then each of these sequences has a corresponding limit point in 
X*, w.p. 1.) 

The LS method (GRG) is aimed at finding a locally optimal solution that 
satisfies the Karush—Kuhn—Tucker system of necessary local optimality con- 
ditions, assuming standard model smoothness and regularity conditions. 

In all three global search modes the model functions are aggregated by 
an exact penalty (merit) function. By contrast, in the local search phase all 
model functions are considered and handled individually. The global search 
phases incorporate both deterministic and stochastic sampling procedures: 
the latter support the usage of statistical bound estimation methods, under 
basic continuity assumptions. All LGO component algorithms are derivative- 
free. In the global search phase, BB, GARS, and MS use only direct sampling 
information based on generated points and corresponding model function 
values. In the LS phase central differences are used to approximate function 
gradients (under a postulated locally smooth model structure). This direct 
search approach reflects our objective to handle also models defined by merely 
computable, continuous functions, including completely “black box” systems. 

In numerical practice—with finite runs, and user-defined or default option 
settings—the LGO global solver options generate a global solution estimate 
that is subsequently refined by the local search mode. If the LS mode is 
used without a preceding global search phase, then LGO serves as a general- 
purpose local solver engine. The expected practical outcome of using LGO to 
solve a model (barring numerical problems which could impede any numerical 
method) is a global-search-based feasible solution that meets at least the local 
optimality conditions. Extensive numerical tests and a range of practical 
applications demonstrate that LGO can locate the global solution not only 
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in the usual academic test problems, but also in more complicated, sizeable 
GO models: this point is illustrated later on in Sections 11.5 and 11.6. (At 
the same time, keep in mind the caveats mentioned earlier regarding the 
performance of any global solver: nothing will “always” work satisfactorily, 
under resource limitations.) 


11.4.2 LGO Implementations 


The current platform-specific implementations include the following. 


e LGO with a text input/output interface, for C and FORTRAN compiler 
platforms 

e LGO integrated development environment with a Microsoft Windows style 

menu interface, for C and FORTRAN compiler platforms 

AIMMS /LGO solver engine 

AMPL /LGO solver engine 

GAMS /LGO solver engine 

Global Optimization Toolbox for Maple (the LGO solver linked to Maple 

as a callable add-on package) 

e MathOptimizer Professional, with an LGO solver engine link to Mathe- 
matica 

e MPL /LGO solver engine 

e TOMLAB /LGO, for MATLAB users 


Technical descriptions of these software implementations, including de- 
tailed numerical tests and a range of applications, have appeared elsewhere. 
For implementation details and illustrative results, consult Pintér (1996a, 
1997, 2001a,b, 2002a,b, 2003b, 2005), as well as Pintér and Kampas (2003) 
and Pintér et al. (2004, 2006). 

The compiler-based LGO solver suite can be used in standalone mode, and 
also as a solver option in various modeling environments. In its core (text in- 
put/output based) implementation version, LGO reads an input text file that 
contains application-specific (model descriptor) information, as well as a few 
key solver options (global solver type, precision settings, resource and time 
limits). During the program run, LGO makes calls to an application-specific 
model function file that returns function values for the algorithmically chosen 
sequence of arguments. Upon completing the LGO run, automatically gener- 
ated summary and detailed report files are available. As can be expected, this 
LGO version has the lowest demands for hardware; it also runs fastest, and 
it can be directly embedded into various decision support systems, including 
proprietary user applications. The same core LGO system is also available 
in directly callable form, without reading and writing text file: this version 
is frequently used as a built-in solver module in other (general-purpose or 
customized modeling) systems. 
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LGO can also be equipped, as a readily available (implemented) option, 
with a Microsoft Windows style menu interface. This enhanced version is 
referred to as the LGO Integrated Development Environment (IDE). The 
LGO IDE supports model development, compilation, linking, execution, and 
the inspection of results, together with built-in basic help facilities. 

In the two LGO implementations mentioned above, models can be con- 
nected to LGO using one of several programming languages that are avail- 
able on personal computers and workstations. Currently supported platforms 
include, in principle, “all” professional FORTRAN 77/90/95 and C/C++ 
compilers. Examples of supported compilers include Compag, Intel, Lahey, 
and Salford FORTRAN, as well as g77 and g95, and Borland and Microsoft 
C/C++. Other customized versions (to use with other compilers or software 
applications) can also be made available upon request. 

In the optimization modeling language (AIMMS, AMPL, GAMS, and 
MPL) or ISTC (Maple, Mathematica, and TOMLAB) environments the core 
LGO solver engine is seamlessly linked to the corresponding modeling plat- 
form, as a dynamically callable or shared library, or as an executable program. 
The key advantage of using LGO within a modeling or ISTC environment 
is the combination of modeling-system-specific features, such as model pro- 
totyping and detailed development, model consistency checking, integrated 
documentation, visualization, and other platform-specific features, with a nu- 
merical performance comparable to that of the standalone LGO solver suite. 

For peer reviews of several of the listed implementations, the reader is 
referred to Benson and Sun (2000) on the core LGO solver suite, Cogan 
(2003) on MathOptimizer Professional, and Castillo (2005), Henrion (2006), 
and Wass (2006) on the Global Optimization Toolbox for Maple. Let us 
also mention here that LGO serves to illustrate global optimization software 
(in connection with a demo version of the MPL modeling system) in the 
prominent O.R. textbook by Hillier and Lieberman (2005). 


11.5 Illustrative Examples 


In order to present some small-scale, yet nontrivial numerical examples, in 
this section we illustrate the functionality of the LGO software as it is im- 
plemented in the Global Optimization Toolbox (GOT) for Maple. 

Maple (Maplesoft, 2006) enables the development of interactive documents 
called worksheets. Maple worksheets can incorporate technical model descrip- 
tion, combined with computing, programming, and visualization features. 
Maple includes several thousands of built-in (directly callable) functions to 
support the modeling and computational needs of scientists and engineers. 
Maple also offers a detailed online help and documentation system with 
ready-to-use examples, topical tutorials, manuals, and Web links, as well as 
a built-in mathematical dictionary. Application development is assisted by 
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debugging tools, and automated (ANSI C, FORTRAN 77, Java, Visual Ba- 
sic, and MATLAB) code generation. Document production features include 
HTML, MathML, TeX, and RTF converters. These capabilities accelerate 
and expand the scope of the optimization model development and solution 
process. Maple, similarly to other modeling environments, is portable across 
all major hardware platforms and operating systems (including Windows, 
Macintosh, Linux, and UNIX versions). 

Without going into further details on Maple itself, we refer to the Web 
site www.maplesoft.com that offers in-depth topical information, including 
product demos and downloadable technical materials. 

The core of the Global Optimization Toolbox for Maple is a customized 
implementation of the LGO solver suite (Maplesoft, 2004) that, as an add-on 
product, upon installation, can be fully integrated with Maple. The advan- 
tage of this approach is that, in principle, the GOT can readily handle “all” 
continuous model functions that can be defined in Maple, including also new 
(user-defined) functions. 

We do not wish to go into programming details here, and assume that the 
key ideas shown by the illustrative Maple code snippets are easily understand- 
able to all readers with some programming experience. Maple commands are 
typeset in Courier bold font, following the so-called classic Maple input 
format. The input commands are typically followed by Maple output lines, 
unless the latter are suppressed by using the symbol “:” instead of “;” at 
the end of an input line. 

In the numerical experiments described below, an AMD Athlon 64 (3200+, 
2GHz) processor-based desktop computer has been used that runs under 
Windows XP Professional (Version 2002, Service Pack 2). 


11.5.1 Getting Started with the Global Optimization 
Toolbox 


To illustrate the basic usage of the Toolbox, let us revisit model (11.3). The 
Maple command 


> with(Global0ptimization) ; 
makes possible the direct invocation of the subsequently issued, GOT related, 
commands. Then the next Maple command numerically solves model (11.3): 
the response line below the command displays the approximate optimum 
value, and the corresponding solution argument. 


> GlobalSolve(cos(x)*sin(x*2-x), x=1..10); 


[-.990613849411236758, [x = 9.28788130421885682]] 
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The detailed runtime information not shown here indicates that the total 
number of function evaluations is 1262; the associated runtime is a small 
fraction of a second. 

Recall here Figure 11.1 which, after careful inspection, indicates that this 
is indeed the (approximate) global solution. (One can also see that the default 
visualization—similarly to other modeling environments—has some difficul- 
ties to depict this rapidly changing function.) There are several local solutions 
that are fairly close to the global one: two of these numerical solutions are 


[-.979663995439954860, [x = 3.34051270473064265)], 
and 
[-.969554320487729716, [x = 6.52971402762202757]]. 


Similarly, the next statement returns an approximate global solution in 
the visibly nontrivial model (11.4): 


> GlobalSolve (cos (x)*sin(y*2-x)+cos(y) *sin(x*2-y) , 
x=1..10, y=1..10); 


[-1.95734692335253380, 
[x = 3.27384194476651214, y = 6.02334184076140478}]. 


The result shown above has been obtained using GOT default settings: the 
total number of function evaluations in this case is 2587, and the runtime is 
still practically zero. Recall now also Figure 11.2 and the discussion related 
to the possibly numerical difficulty of GO models. The solution found by 
the GOT is global-search-based, but without a rigorous deterministic guar- 
antee of its quality. Let us emphasize that to obtain such guarantees (e.g., by 
using interval-arithmetic-based solution techniques) can be a very resource- 
demanding exercise, especially in more complex and/or higher-dimensional 
models, and that it may not be possible, for example, in “black box” situa- 
tions. A straightforward way to attempt finding a better quality solution is to 
increase the allocated global search effort. Theoretically, using an “infinite” 
global search effort will lead to an arbitrarily close numerical estimate of the 
global optimum value. In the next statement we set the global search effort 
to 1000000 steps (this limit is applied only approximately, due to the possible 
activation of other stopping criteria): 


> GlobalSolve (cos (x)*sin(y*2-x)+cos(y) *sin(x*2-y) , 
x=1..10, y=1..10, evaluationlimit=1000000, 
noimprovementlimit=1000000) ; 
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[-1.98122769882222882, 
[x = 9.28788128193757068, y = 9.28788127177065270]]. 


Evidently, we have found an improved solution, at the expense of a sig- 
nificantly increased global search effort. (Now the total number of function 
evaluations is 942439, and the runtime is approximately 5 seconds.) In gen- 
eral, more search effort can always be added, in order to verify or perhaps 
improve the incumbent numerical solution. 

Comparing now the solution obtained to that of model (11.3), and observ- 
ing the obvious formal connection between the two models, one can deduce 
that now we have found a close numerical approximation of the true global 
solution. Simple modeling insight also tells us that the global solution in 
model (11.4) is bounded from below by —2. Hence, even without Figures 11.1 
and 11.2 we would know that the solution estimates produced above must be 
fairly close to the best possible solution. 

The presented examples illustrate several important points. 


e Global optimization models can be truly difficult to solve numerically, even 
in (very) low dimensions. 

e It is not always possible to “guess” the level of difficulty. One cannot 
always (or at all) generate model visualizations similar to Figures 11.1 and 
11.2, even in chosen variable subspaces, because it could be too expensive 
numerically, even if we have access to suitable graphics facilities. Insight 
and model-specific expertise can help significantly, and these should be 
used whenever possible. 

e There is no solver that will handle all possible instances from the general 
CGO model class within an arbitrary prefixed amount of search effort. 
In practice, one needs to select and recommend default solver parameters 
and options that “work well in most cases, based on an acceptable amount 
of effort.” Considering the fact that practically motivated modeling stud- 
ies are often supported only by noisy and/or scarce data, this pragmatic 
approach is justifiable in many practical situations. 

e The default solver settings should return a global-search-based high- 
quality feasible solution (arguably, the models (11.3) and (11.4) can be 
considered as difficult instances for their low dimensionality). Further- 
more, it should be easy to modify the default solver settings and to repeat 
runs, if this is deemed necessary. 


The GOT software implementation automatically sets default parameter 
values for its operations, partly based on the model to solve. These settings 
are suitable in most cases, but the user can always assign (i.e., override) them. 
Specifically, one can select the following options and parameter values. 


e Minimization or maximization model 
e Search method (BB+LS, GARS+LS, MS+LS, or standalone LS) 
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e Initial solution vector setting (used by the LS operational mode), if avail- 
able 

e Constraint penalty multiplier: this is used by BB, GARS, and MS, in an 
aggregated merit function (recall that the LS method handles all model 
functions individually) 

e Maximal number of merit function evaluations in the selected global search 
mode 

e Maximal number of merit function evaluations in the global search mode, 
without merit function value improvement 

e Acceptable target value for the merit function, to trigger an “operational 
switch” from global to local search mode 

e Feasibility tolerance used in LS mode 

e Karush—Kuhn—Tucker local optimality tolerance in LS mode 

e Solution (computation) time limit 


For further information regarding the GOT, consult the product Web page 
(Maplesoft, 2004), the article (Pintér et al., 2006), and the related Maple help 
system entries. The product page also includes links to detailed interactive 
demos, as well as to downloadable application examples. 


11.5.2 Handling (General) Constrained Global 
Optimization Models 


Systems of nonlinear equations play a fundamental role in quantitative stud- 
ies, because equations are often used to characterize the equilibrium states 
and optimality conditions of physical, chemical, biological, or other systems. 
In the next example we formulate and solve a system of equations. At the 
same time, we also illustrate the use of a general model development style that 
is easy to follow in Maple, and—mutatis mutandis—also in other modeling 
systems. Consider the equations 


> eqi := exp(x-y)+sin(2*x)-cos(y+z)=0: (11.5) 
eq2 := 4*x-exp(z-y)+5*sin(6*x-y)+3*cos (3*x*y)=0: 
eq3 := x*y*z-10=0: 


To solve this system of equations, let us define the optimization model 
components as shown below (notice the dummy objective function). 


> constraints := eqi,eq2,eq3: 
> bounds := x=-2..2, y=-1..3, z=2..4: 
> objective:=0: 


Then the next Maple command is aimed at generating a numerical solution 
to (11.5), if such solution exists. 


11 Global Optimization in Practice 395 


> solution:= 
GlobalSolve(objective, constraints, bounds) ; 


solution:=(0., 
[x=1.32345978290539557,y=2.78220763578413344,z=2.71581206431678090]]. 


This solution satisfies all three equations with less than 10~° error, as 
verified by the next statement: 


> eval(constraints, solution[2]); 
{—0.1- 10-9 =0, —0.6-10-° = 0, 0=0} 


Without going into details, let us note that multiple solutions to (11.5) 
can be found (if such solutions exist), for example, by iteratively adding 
constraints that will exclude the solution(s) found previously. Furthermore, 
if a system of equations has no solutions, then using the GOT we can obtain 
an approximate solution that has globally minimal error over the box search 
region, in a given norm: consult Pintér (1996a) for details. 

Next, we illustrate the usage of the GOT in interactive mode. The state- 
ment shown below directly leads to the Global Optimization Assistant dialog, 
see Figure 11.3. 


> solution:= 
Interactive(objective, constraints, bounds) ; 


Using the dialog, one can also directly edit (modify) the model formulation 
if necessary. The figure shows that the default (MS+LS) GOT solver mode 
returns the solution presented above. Let us point out here that none of the 
local solver options indicated in the Global Optimization Assistant (see the 
radio buttons under Solver) is able to find a feasible solution to this model. 
This finding is not unexpected: rather, it shows the need for a global scope 
search approach to handle this model and many other similar problems. 

Following the numerical solution step, one can press the Plot button 
(shown in the lower right corner in Figure 11.3). This will invoke the Global 
Optimization Plotter dialog shown in Figure 11.4. In the given subspace (2, y) 
that can be selected by the GOT user, the surface plot shows the identically 
zero objective function. Furthermore, on its surface level one can see the con- 
straint curves and the location of the global solution found: in the original 
color figure this is a light green dot close to the boundary as indicated by 
the numerical values found above. Notice also the option to select alternative 
subspaces (defined by variable pairs) for visualization. 

The figures can be rotated, thereby offering the possibility of detailed 
model function inspection. Such inspection can help users to increase their 
understanding of the model. 
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Global Optimization Assistant 


© linesr 


ze [2,4] 


oft) 4 sin(2 x) -cos(y+z)=0 
axe 4 5 sin(6x-y) + 3cos(3xy)=0 
xyz-10=0 


Objective value: 0. 

x = 1,32345978290539557 
¥ = 2.78220763578413344 
z = 2.71581206431678090 


Fig. 11.3 Global Optimization Assistant dialog for model (11.5). 


Global Optimization Plotter 


Fig. 11.4 Global Optimization Plotter dialog for model (11.5). 
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11.5.3 Optimization Models with Embedded 
Computable Functions 


It was pointed out earlier (in Section 11.1) that in advanced decision models 
some model functions may require the execution of various computational 
procedures. One of the advantages of using an ISTC system such as Maple 
is that the needed functionality to perform these operations is often read- 
ily available, or directly programmable. To illustrate this point, in the next 
example we show the globally optimized argument value of an objective func- 
tion defined by Bessel functions. As it is known, the function BesselJ(v, x) 
satisfies Bessel’s differential equation 


gy" + ay! + (2? — v?)y = 0. (11.6) 

In (11.6) « is the function argument, and the real value v is the order (or 

index parameter) of the function. The evaluation of BesselJ requires the solu- 

tion function of the differential equation (11.6), for the given value of v, and 

then the calculation of the corresponding function value for argument x. For 

example, BesselJ(0, 2)~0.2238907791; consult Maple’s help system for further 
details. Consider now the optimization model defined and solved below: 


> objective:=BesselJ(2,x)*BesselJ(3,y)- (11.7) 
BesselJ(5,y)*BesselJ(7,x): 

> bounds := x=-10..20, y=-15..10: 

> solution:=GlobalSolve(objective, bounds) ; 


solution := [~.211783151218360000, 
[x = —3.06210564091438720, y = —4.20467390983796196]]. 


The corresponding external solver runtime is about 4 seconds. The next fig- 
ure visualizes the box-constrained optimization model (11.7). Here a simple 
inspection and rotation of Figure 11.5 helps to verify that the global solu- 
tion is found indeed. Of course, this would not be directly possible in general 
(higher-dimensional or more complicated) models: recall the related earlier 
discussion and recommendations from Section 11.5.1. 


11.6 Global Optimization: Applications and 
Perspectives 


In recent decades, global optimization gradually has become an established 
discipline that is now taught worldwide at leading academic institutions. 
GO methods and software are also increasingly applied in various research 
contexts, including industrial and consulting practice. The currently available 
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Fig. 11.5 Optimization model objective defined by Bessel functions. 


professional software implementations are routinely used to solve models with 
tens, hundreds, and sometimes even thousands of variables and constraints. 
Recall again the caveats mentioned earlier regarding the potential numerical 
difficulty of model instances: if one is interested in a guaranteed high-quality 
solution, then the necessary runtimes could become hours (or days, or more), 
even on today’s high-performance computers. One can expect further speed- 
up due to both algorithmic improvements and progress in hardware/software 
technology, but the theoretically exponential “curse of dimensionality” asso- 
ciated with the subject of GO will always be there. 

In the most general terms, global optimization technology is well suited 
to analyze and solve models in advanced (acoustic, aerospace, chemical, con- 
trol, electrical, environmental, and other) engineering, biotechnology, econo- 
metrics and financial modeling, medical and pharmaceutical studies, process 
industries, telecommunications, and other areas. 

For detailed discussions of examples and case studies consult, for exam- 
ple, Grossmann (1996), Pardalos et al. (1996), Pintér (1996a), Corliss and 
Kearfott (1999), Papalambros and Wilde (2000), Edgar et al. (2001), Gao et 
al. (2001), Schittkowski (2002), Tawarmalani and Sahinidis (2002), Zabinsky 
(2003), Neumaier (2006), Nowak (2005), and Pintér (2006a), as well as other 
topical works. 

For example, recent numerical studies and applications in which LGO 
implementations have been used are described in the following works: 


e Cancer therapy planning (Tervo et al., 2003) 
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e Combined finite element modeling and optimization in sonar equipment 
design (Pintér and Purcell, 2003) 

e Laser equipment design (Isenor et al., 2003) 
Model calibration (Pintér, 2003a, 2006b) 

e Numerical performance analysis on a collection of test and “real-world” 
models (Pintér, 2003b, 2006b) 

e Physical object configuration analysis and design (Kampas and Pintér, 
2006) 

e Potential energy models in computational chemistry (Pintér, 2000, 2001b, 
Stortelder et al., 2001) 

e Circle packing models and their industrial applications (Kampas and 
Pintér, 2004, Pintér and Kampas, 2005a,b, Castillo et al., 2008) 


The forthcoming volumes by Kampas and Pintér (2009) and Pintér (2009) 
also discuss a large variety of GO applications, with extensive references. 


11.7 Conclusions 


Global optimization is a subject of growing practical interest as indicated by 
recent software implementations and by an increasing range of applications. 
In this work we have discussed some of these developments, with an emphasis 
on practical aspects. 

In spite of remarkable progress, global optimization remains a field of ex- 
treme numerical challenges, not only when considering “all possible” GO 
models, but also in practical attempts to handle complex and sizeable prob- 
lems within an acceptable timeframe. The present discussion advocates a 
practical solution approach that combines theoretically rigorous global search 
strategies with efficient local search methodology, in integrated, flexible solver 
suites. The illustrative examples presented here, as well as the applications 
referred to above, indicate the practical viability of such an approach. 

The practice of global optimization is expected to grow dynamically. We 
welcome feedback regarding current and future development directions, new 
test challenges, and new application areas. 
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Chapter 12 


Two-Stage Stochastic Mixed-Integer 
Programs: Algorithms and Insights 


Hanif D. Sherali and Xiaomei Zhu 


Summary. Stochastic (mixed-) integer programs pose a great algorithmic 
and computational challenge in that they combine two generally difficult 
classes of problems: stochastic programs and discrete optimization problems. 
Exploring its dual angular structure, various decomposition methods have 
been widely studied, including Benders’ decomposition, Lagrangian relax- 
ation, and test-set decomposition. These decomposition methods are often 
combined with search procedures such as branch-and-bound or branch-and- 
cut. Within the confines of these broad frameworks, fine-tuned algorithms 
have been proposed to overcome obstacles such as nonconvexity of the second- 
stage value functions under integer recourse, and to take advantage of the 
many similar structured scenario subproblems using variable transformations. 
In this chapter, we survey some recent algorithms developed to solve two- 
stage stochastic (mixed-) integer programs, as well as provide insights into 
and results concerning their interconnections, particularly for alternative con- 
vexification techniques. 


Key words: Two-stage stochastic mixed-integer programs, L-shaped meth- 
od, Benders’ decomposition, branch-and-cut, convexification, reformulation- 
linearization technique (RLT), disjunctive programming 


12.1 Introduction 


In this chapter, we discuss two-stage stochastic mixed-integer programs 
(SMIP) of the following form. 
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SMIP: Minimize cx + E[f(z,)] (12.1a) 
subject to Ag > b (12.1b) 
2>0 (12.1c) 

x, binary, Vi € Ip CI = {1,..., m1} (12.1d) 

x; integer, Vi € Tine CI \ In, (12.1e) 


where @ is a random variable defined on a probability space (Q, A, P) (with 
2, A, and P, respectively, denoting the set of all outcomes, a collection of 
random variables, and their associated probability distributions), and where 
for any given realization w of w, we have 


f(x,w) = minimum g(w)y (12.1f) 
subject to W(w)y > r(w) — T(w)a (12.1g) 

y>0 (12.1h) 

y; binary, Vj € Jp CJ = {l,...,no} (12.13) 

y; integer, Vj € Jing C J \ Jy. (12.1) 


In the above, A is an m1 X n; matrix, and for each w, W(w) is an mz x ng 
recourse matrix, T(w) is an mz x n1 technology matrix, and the other defined 
vectors are of corresponding conformable sizes. We assume that the elements 
of A, T(w), and W(w) are rational (so that they are scalable to integers, if 
necessary). For computational viability, a finite number of scenarios, denoted 
as a set S and indexed by s, is often considered based on some discretiza- 
tion of the possible realizations of ©, each with an associated probability of 
occurrence p,, 8 € S. (See [29] for a justification on approximating contin- 
uously distributed scenario parameters by a discrete distribution having a 
finite support.) Accordingly, the realizations of g(w), W(w), T(w), and r(w) 
are correspondingly denoted as g,, Ws, T';, and 15, respectively, for s € S. In 
this chapter, we assume such a discrete probability distribution. 

Arguably, SMIP is among the most challenging of optimization problems 
because it combines two generally difficult classes of problems: stochastic 
programs and discrete optimization problems. Extended from theories par- 
ticularly in large scale optimization and integer programming ((25, 24]), re- 
searchers have actively studied the properties and solution approaches for 
such problems. We refer the reader to the extensive survey papers by Schultz 
et al. [31], Klein Haneveld and van der Vlerk [21], Schultz [30], and Sen [34], 
and an annotated bibliography by Stougie and van der Vlerk [42] for a dis- 
cussion on the principal properties of SMIPs and some earlier algorithmic 
developments in this area. A group of important stochastic integer programs 
are problems having simple integer recourse. Research in this area, such as 
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by Klein Haneveld et al. [19, 20], have been included in detail in various 
aforementioned surveys. In this chapter, we focus on certain recent algorith- 
mic advances in solving two-stage stochastic (mixed-) integer programs. We 
also provide insights into their interconnections, and present some results 
relating the convexification strategies afforded by the reformulation-linear- 
ization technique (RLT) and disjunctive programming methods for solving 
mixed-integer 0-1 SMIPs. 

The remainder of this chapter is organized as follows. In Section 12.2, 
we survey different decomposition frameworks that have been used to solve 
SMIPs, including decomposition by stage (primal decomposition), decompo- 
sition by scenario (dual decomposition), and test-set decomposition. In Sec- 
tion 12.3, we exhibit certain insightful relationships between some convexifi- 
cation methods that have been used for solving mixed-integer 0-1 stochastic 
problems, particularly, employing disjunctive programming and RLT-based 
cutting plane methods. In Section 12.4, we discuss three enumerative meth- 
ods using tender variables when the technology matrix is fixed. We conclude 
this chapter in Section 12.5. 


12.2 Stage, Scenario, and Test-Set Decomposition 


Collecting the problems for the two stages together, and denoting the con- 
straints (12.1b)—(12.1e) as ~ € X and the constraints (12.1h)—(12.1j) written 
for scenario s as y* € Y*, the deterministic equivalent form for (12.1) is given 
as follows. 


Minimize cx + S° psgsy® (12.2a) 
ses 

subject to Tsa+W.y? =rs, VsES (12.2b) 

zex, yEeY*’, VseS. (12.2c) 


This representation reveals a dual angular structure that lends itself well 
to decomposition schemes. We discuss two major groups of decomposition 
methods, by stages and by scenarios, as well as a novel approach called test- 
set decomposition. Decomposition by stages, also known as primal decom- 
position, essentially adopts the framework of Benders’ decomposition (Ben- 
ders [9]), and is more popularly known as the L-shaped method (Van Slyke 
and Wets [43]) in the context of stochastic programming. In this approach, 
for each first-stage solution £ produced by a master program, one solves the 
corresponding second-stage problem (also called a scenario subproblem, or 
simply subproblem) based on Problem (12.2) with x fixed at Z. We discuss 
these methods in Section 12.2.1. Scenario decomposition methods, also known 
as dual decomposition, work with relaxing and restoring the nonanticipativ- 
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ity condition, which, in a two-stage problem setting, simply means that all 
scenario outcomes should be based on some identical first-stage solution. 
When this condition is relaxed, smaller-scaled problems corresponding to 
scenarios are obtained that are easier to solve, but they also yield different 
first-stage solutions, which need to be reconciled. These methods are covered 
in Section 12.2.2. In Section 12.2.3, we describe the test-set decomposition 
approach by Hemmecke and Schultz [17], which decomposes the problem’s 
Graver test-sets, instead of the problem itself. 


12.2.1 Stage Decomposition — L-Shaped Methods 


The simplest form of stochastic mixed-integer programs contains purely bi- 
nary first-stage variables and purely continuous second-stage variables. In 
this case, Benders’ decomposition, or the L-shaped method, can be easily 
applied using some form of enumeration on the binary first-stage variables, 
as in Wollmer [44]. Alternatively, and also more generally, if the second- 
stage problems are easy to solve (not necessarily purely continuous), then 
these problems can be solved using the integer L-shaped method of Laporte 
and Louveaux [23]. This is a branch-and-cut (B&C) procedure that is imple- 
mented in the projected space of the first-stage variables. At each node, the 
two-stage problem is solved using Benders’ decomposition, which generates 
feasibility and/or optimality cuts for the first-stage solution, as necessary. The 
Benders master problem (or the “current problem” of the L-shaped method) 
at some node q then takes the form: 


Minimize cx +7 (12.3a) 
subject to D,x >dy, Vk =1,...,Fy (12.3b) 
Lye > ly, We=1,...,0q (12.3c) 
x € X, 7 unrestricted, (12.3d) 


where k = 1,...,F, andk = 1,...,Og, respectively, index the feasibility cuts 
(12.3b) and the optimality cuts (12.3c) at node q, and 77 is a variable rep- 
resenting an approximation of the expected second-stage objective value (or 
the second-stage value function). If feasibility of the subproblems is guaran- 
teed for any fixed x, or x € X, respectively known as the case of complete 
recourse or relatively complete recourse, then feasibility cuts will not be gen- 
erated from the second-stage problems, and (12.3b) will only include certain 
relevant constraints that fix particular components of x at 0 or 1 as per the 
branching restrictions. A nodal problem at node q in the B&C tree is re- 
solved whenever a feasibility or optimality cut is added. It is fathomed by 
the infeasibility of (12.3) or by the bounding processes, and is otherwise par- 
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titioned via branching if the solution to the continuous relaxation does not 
satisfy the binary restrictions. Unlike in a typical branch-and-bound (B&B) 
process, whenever integrality is satisfied and reveals a potentially better in- 
cumbent solution to the current (relaxed) master program, the node is not 
necessarily fathomed. Instead, the subproblem is solved to possibly generate 
an optimality cut. 

Theoretically, this framework of embedding the L-shaped method in a 
B&C process applies to two-stage SMIPs in which the first stage contains 
binary or integer variables and valid optimality cuts are obtainable from the 
second-stage problems. In practice, however, valid optimality cuts are not 
easy to derive unless the second stage contains purely continuous variables, 
or the first stage contains purely binary variables. In the former case, the 
second-stage value functions f,(-) are piecewise linear and convex, and so, 
valid optimality cuts can be derived using the dual variables directly. In the 
latter case, for any binary feasible solution x”, defining [, = {7 € I: £7 = 1}, 
the following is a set of valid optimality cuts. 


n> (Q(a")-L) [Soa - Soa - || + M+ L,Vr=1,...,R, (124) 


i€I,. i Ip. 


where Q(z) = 3°, psfs(x), L is a lower bound on Q, and 1,..., R are indices 
of all the binary feasible solutions that have been encountered thus far. 

When the second-stage problems contain integer variables, their value 
functions f,(-) are lower semicontinuous (Blair and Jeroslow [11]) and in 
general nonconvex and discontinuous. Thus optimality cuts are not readily 
available from dual variables as in the continuous case. In such cases, when 
the two-stage program has a deterministic cost vector g and recourse matrix 
W, Carge and Tind [14] propose to use a subset F of the dual price functions 
of the second-stage integer programs (see Nemhauser and Wolsey [25] for 
dual price functions for integer programs) to derive feasibility cuts and opti- 
mality cuts. These functions are, however, nonlinear in general, resulting in a 
nonlinear form of Problem (12.3). Moreover, the dual price function class F 
has to be sufficiently large so that the duality gap between the second-stage 
problem and its (F-) dual is closed. The authors show that this is achieved 
when the second-stage problems are solved using cutting plane techniques or 
branch-and-bound. In these cases, finite termination is also established, given 
that (12.3) can be solved finitely. In particular, when Gomory’s cuts are ap- 
plied, the dual price functions can be transformed into linear functions having 
mixed-integer variables, and (12.3) is then a mixed-integer linear problem in 
lieu of a nonlinear problem. 

The above decomposition methods work with the primal second-stage 
problems, given any fixed first-stage solution. Within this framework, al- 
ternative methods have been developed to obtain optimality cuts in the case 
of integer recourse. We discuss some of these methods in Section 12.3. In 
Section 12.2.2 below, we turn to another type of decomposition approach 
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that relaxes the nonanticipativity condition and works with copies of the 
first-stage solution to restore this condition. 


12.2.2 Scenario Decomposition — Relaxation of the 
Nonanticipativity Condition 


The second-stage problems are related to each other through the first-stage 
solution only. If we make a copy of the first-stage variable x for each scenario, 
say x° for s € 9, and enforce the nonanticipativity condition, x! = 2°, Vs € S, 
conveniently written as }),-g H°x* = 0 for some suitable H7,,,,, then the 
deterministic equivalent problem shown in (12.2) can be rewritten as follows, 


min{) "ps(cx® + gsy®) : (12.2b), (12.2c), 5” H*a® = 0}. (12.5) 
séS ses 


The Lagrangian dual of (12.5) is max | D,(A)}, where 
€ 


ses 
D,(A) = min{ps(cx* + gsy*) + A(H*x*) : (12.2b), (12.2c)}, Vs € S. 


The Lagrangian dual value yields a tighter lower bound for Problem (12.2) 
than its LP-relaxation. Carge and Schultz [12] accordingly propose a dual de- 
composition algorithm that uses the values of the Lagrangian dual as lower 
bounds in a B&B process. The tradeoff here is having to solve a nonsmooth 
Lagrangian dual problem compared to a linear program. At each nodal prob- 
lem, after solving the associated Lagrangian dual problem, the solutions 2°, 
Vs € S, are averaged and rounded to obtain some @ satisfying the integrality 
restrictions. The objective value of (12.2) obtained after fixing x at % is used 
to update the upper bound. At the branching step, for a selected branch- 
ing variable x; (component of x), constraints x; < |%;| and 2; > [Z;], or 
uj, <2; -—e and x; > Z; + € for some tolerance € > 0, are applied at the two 
child nodes, respectively, depending on whether x; is an integer or continu- 
ous variable. This algorithm is finitely convergent if X is bounded and the 
x variables are purely integer-restricted. Schultz and Tiedemann [33] extend 
this approach to solve stochastic programs that include an additional objec- 
tive term based on the probability of a risk function exceeding a prespecified 
threshold value. A survey for SMIPs having risk measures can be found in 
Schultz [30]. 

Alonso-Ayuso et al. [3] also relax the nonanticipativity condition in their 
branch-and-fix coordination algorithm (BFC) for pure 0-1 multistage sto- 
chastic programs, or mixed 0-1 two-stage stochastic programs having purely 
binary variables in the first stage. They assume deterministic technology and 
recourse matrices. In this algorithm, both the nonanticipativity condition 
and the binary restrictions on x are relaxed, resulting in a linear problem 
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for each scenario. A B&B tree is developed for each scenario at a terminal 
node of the scenario tree, and the branch and fix operations are performed 
on the so-called “twin node families.” In a two-stage setting, a twin node 
family includes the nodal problems in the B&B trees for all scenarios that 
have the already fixed a-variables equal to a same value (0 or 1). To enforce 
the nonanticipativity condition, the same branching variable is selected for 
the active nodes in a twin node family. Note that these nodes belong to dif- 
ferent scenarios. Two new twin node families are then formed by fixing the 
selected branching variable at 0 and 1, respectively. Lower bounds on the 
objective value are updated by calculating the deterioration of the objective 
value due to the fixing. Alonso-Ayuso et al. [4] demonstrated this approach 
in a set of supply chain planning problems, having up to 3933 constraints 
and 3768 variables (including 114 binary variables), using an 800 MHz Pen- 
tium ITI Processor with 512 Mb of RAM. However, no comparison with other 
algorithms or commercial software packages was made. 


12.2.3 Decomposition of Test-Sets 


When the cost vector g, the technology matrix T, and the recourse matrix 
W are all deterministic, and randomness only appears in the right-hand-side 
values r;, Hemmecke and Schultz [17] take advantage of the dual angular 
structure in (12.2) to develop a test-set decomposition-based approach for 
two-stage stochastic integer programs. If the problem contains continuous 
variables, certain inherited computational difficulties from such an approach 
demand extra care. 
A finite universal test-set for a family of integer problems 


(IP).p: min{ez: Az=b,z € Z4}, (12.6) 


where A € Q!*¢ is fixed, and where c € R? and b € R! vary, contains a set 
of vectors that can be used to solve any problem in this family (IP). for a 
given c and b (Hemmecke [16]). In this process, an initial feasible solution is 
first found using the test-set vectors, then an optimal solution is obtained by 
searching along improving directions, again using the vectors (called improv- 
ing vectors) in the test-set. The IP Graver test-set (Graver [15]) is one such 
finite universal test-set, and can be computed using the kernel of A, ker(A), 
as shown in Hemmecke [16]. 

A major drawback of using the test-set approach to solve integer programs 
is that a large number of vectors need to be stored, even for small-sized 
problems. Hence, for the typically large-sized stochastic integer programs, 
directly applying this approach is impractical. Exploring the dual angular 
structure of (12.2), Hemmecke and Schultz [17] show that Graver test-set 
vectors for this type of problem can be decomposed into, and then constructed 
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from, a small number of building-block vectors, and the latter can then be 
used to solve large-scaled stochastic integer programs. 


Let 
AO 0 0 
TW 0... 0 7 
0 
A\s| = ede eke and Ar = (Fy): (12.7) 
T0 0...W 


Observing that (u,v1,..-,v)5)) € ker(Ajs|) + (u, v1), ---, (u, U5) € ker(A1), 
Hemmecke and Schultz [17] propose to use the individual vectors u,v1,..., 
and vjg; as building blocks of the vectors (u, v1,...,v)g)) in the Graver test-set 
of Ajs|. These building blocks, in lieu of the test-set vectors, are collected into 
a set H. and arranged in pairs (u, V,,), where V,, is the set of vectors v such 
that (u,v) € ker(A). The set of building blocks H,, is shown to be finite, and 
can be computed using a finite algorithm. Although the computation of H,, 
is expensive, because it depends only on A, W, and T,, and is independent 
of |S|, cost coefficients, and right-hand-side values, the proposed approach is 
insensitive to the number of scenarios once H. is obtained. After computing 
and storing the building blocks, the test-set vectors are constructed during 
the two steps of obtaining an initial feasible solution and finding improving 
directions to solve (12.2). The finiteness of 71. guarantees the finiteness of 
the proposed algorithm. 


12.3 Disjunctive Programming and RLT Cutting Plane 
Methods for Mixed 0—1 Stochastic Programs 


In this section, we assume the following. 


Al. No general integer variable exists in either stage; that is, ling = Jing = Z. 

A2. The continuous variables in both stages are bounded. Moreover, for the 
purpose of exposition, the continuous variables in the second stage are 
scaled onto [0,1], with the corresponding bounding restrictions y < 1 
being absorbed within (or implied by) (12.1g). 

A3. For any feasible first-stage variable x € X, the second-stage problems 
are feasible (relative complete recourse). 


We study two groups of cutting plane approaches for solving two-stage 
SMIPs having 0-1 mixed-integer variables, and focus on the idea of shar- 
ing cut coefficients among scenarios. The first group is based on disjunc- 
tive programming, represented by the research of Carge and Tind [13], Sen 
and Higle [35], Sen and Sherali [36], and Ntaimo and Sen [27]. The second 
group applies the reformulation-linearization technique (RLT) to derive cut- 
ting planes, and includes the papers by Sherali and Fraticelli [39] and Sherali 
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and Zhu [41]. Both disjunctive cuts and RLT cuts are convexification cutting 
planes applied to deal with the binary variables in the second-stage problems. 
In solving stochastic programs, particularly when the number of scenarios is 
large, memory space is crucial. If some cut coefficient can be shared between 
scenarios or iterations, we can gain much in memory savings and algorithmic 
efficacy. In Section 12.3.1, we therefore focus on the convexification process 
and cut-sharing properties for solving mixed-integer 0-1 second-stage prob- 
lems. In Section 12.3.2, we then discuss solution approaches for problems 
containing continuous first-stage variables, as these problems pose an extra 
challenge on assuring convergence. In Section 12.3.3, we relate the cuts gen- 
erated using disjunctive programming and the RLT technique to show their 
interconnections. 


12.3.1 Solving Mixed-Integer 0-1 Second-Stage 
Problems 


Given an index set H and some polyhedral sets P;,, for h € H, a disjunctive 
set P = Uner Ph is said to be in disjunctive normal form. Using disjunc- 
tive programming, we can characterize the closure of the convex hull of P 
and generate valid inequalities and even facets for this representation (see, 
for example, Blair and Jeroslow [10], Balas [5], and Sherali and Shetty [40]). 
The class of 0-1 mixed-integer programs (MIP) belongs to a special type 
of disjunctive programs called facial disjunctive programs, written in con- 
junctive normal form, that is, a conjunction over i € J, of the disjunction 
{x; =0V x; = 1}. For this type of programs, we can obtain the convex hull 
of the feasible solution set using a sequential convexification process. 

The lift-and-project algorithm proposed by Balas et al. [7, 8, 6] solves 0-1 
mixed-integer programs using disjunctive programming. Carge and Tind [13] 
modified this approach and applied it to the deterministic equivalent form 
of SMIP (Problem (12.2)) in which the first-stage variables are purely con- 
tinuous (i.e., [, = @) and the second stage contains both continuous and 
binary variables. Let P* be the set of solutions in the space of (x, y”, Vh € S$) 
when only the constraints on x and y* are considered. A direct application of 
the lift-and-project. method would generate cuts in the (x,y",Vh € S) space, 
sequentially treating a single variable y; for 7 € J, and s € S as binary, and 
the remaining y-variables as continuous. Exploring the dual angular struc- 
ture of the deterministic equivalent form, Carge and Tind [13] proposed the 
generation of cuts for P* in the (a, y*) space for each scenario s € S. This is 
valid because all the y*, s € S, are independent, and by the assumption of 
relatively complete recourse, the projection of P* and the projection of the 
original solution set on the (x, y*) space are the same. 

To generate such cuts for a current solution (%, 9°) for which the value for 
Yq, 7 © Jb, is fractional, consider the disjunction P* = Pj U Py, where 
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Pp = {Ar >b — 7) (12.8a) 
T,c + Wy? > rs <— Xo (12.8b) 
=¢ = OF — 6 (12.8c) 
and 
Pi ={Ar>b -— 7 (12.8d) 
T,c+W,y*® > 15 -— rj (12.8e) 
ys > 1} HG (12.8f) 


with the associated multipliers as shown above, and where we assume that 
the nonnegativity and the bounding constraints for x and y are absorbed in 
Az > band T,a + W.y® > rs, respectively. Define e, as the gth unit row 
vector. A convexification cutting plane for the disjunction (12.8) is generated 
by solving the following linear program. 


Minimize az + By* —¥7 (12.9a) 
subject to: a— TA -—AAT, >0, Vh=0,1 (12.9b) 
B+ So€q — AoW; = 0 (12.9c) 
B—c1eqg— AW; = 0 (12.9d) 
Tb+Aors—y > 0 (12.9e) 
cs +716+ Ars -—y =O (12.9f) 
az + By -—y> 1 (12.9g) 
T,A,¢ > 0, a, 6,7 unrestricted. (12.9h) 


Let (a, 8,7,\,7,¢) be an optimal solution of (12.9). If the objective value 
obtained for (12.9a) is negative (i.e., equals —1 due to (12.9g)), the inequality 


ar + By? >¥F (12.10) 


eliminates the current fractional solution. 

Carge and Tind observe that if the recourse matrix and technology matrix 
are fixed, (12.9b)-(12.9d) will be the same for all scenarios. Hence, @ and 8 
obtained for one scenario will satisfy these constraints for all scenarios, and 
the cuts generated for one scenario are valid for all scenarios by updating 
Gaby the right-hand sides. In particular, @x + By® > ys’ is valid for scenario 
s', Ws’ € S, where ys = ¥ + min{Ao(rs — rs), \1(rs — Ts)}. When only the 
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recourse matrix is fixed but the technology matrix is stochastic, constraints 
(12.9c) and (12.9d) are satisfied by 6 for all s’ € S, and the following linear 
program can be solved to obtain coefficients @ and ¥ for deriving a valid 
inequality for scenario s’. 


Minimize az — y (12.11a) 
subject to: a—7T,A>ApnTy, VWh=0,1 (12.11b) 
Tob — y > —ors' (12.11c) 
mb—y > —-& —Aurs! (12.11d) 
az—y>-1 (12.11e) 
T > 0, a,y unrestricted. (12.11f) 


Comparing this with (12.9b)—(12.9h), because the 6-value is already at hand, 
the terms and constraints related to 6 and W, are dropped, and this reduces 
the size of the problem. Letting (@,7,7) be an optimal solution to (12.11), 
the cut @x + By* > ¥ is valid for scenario s’, s’ € S. 

In a similar fashion, if T and r are fixed and W is stochastic, or if W and 
r are fixed and T is stochastic, then a valid inequality for scenario s’ can be 
obtained by revising the valid inequality (12.10) for scenario s as above. 


Proposition 12.1. Let (12.10) be a valid inequality obtained for scenario s 
by solving (12.9). We have: 


(a) If the technology matrix T and the right-hand side r are fixed, and the 
recourse matrix W is stochastic, then 


art Bey’ >F (12.12a) 
is a valid inequality for scenario s’, where 
Be = B +max{Xo(We — Ws), \1(Ws — Ws)}, (12.12b) 


and where the maz-operation for vectors is performed componentwise. 
(b) Similarly, if the recourse matrix W and the right-hand side r are fixed, 
and the technology matriz T is stochastic, then 


ast + By® > 7 (12.12c) 
is a valid inequality for scenario s', where 
Qs = &@+max{\o(Ts — Ts), 1(Ts — Ts)}. (12.12d) 


Proof. The proof follows the same line as that of the fixed T and W case in 
Carge and Tind [13]. In the case of fixed T and r, we have that the coefficients 
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@ and ¥ satisfy constraints (12.9b), (12.9e), and (12.9f) for s, where T, and 
rs are indistinguishable for all scenarios. Furthermore, from (12.12b) 


3 


Bs > —o€q + AoW. oF do(Ws _ Ws) = —S0€q Tr AoW 


and Be! => S1€q + MW, + Ai (Ws — W;) = S1€q + AiWs', 


which satisfies (12.9c) and (12.9d) for scenario s’. This proves Part (a), and 
Part (b) can be proved similarly. 


Note that the cuts (12.12a) and (12.12c) preserve the fixed technology 
matrix and recourse matrix structure, respectively, in addition to having the 
same right-hand side. Linear programs similar to (12.11) can be constructed 
to generate valid inequalities when only one of T', W, and r is deterministic 
and the other two are stochastic. 

The above approach is based on solving the deterministic equivalent form 
of the two-stage SMIP. If a subproblem having the set of variables (x, y*) is 
solved for each scenario s sequentially, then the validity of the above cuts still 
holds true. 

When using the L-shaped method (or Benders’ decomposition) to solve 
two-stage stochastic programs, we need the value functions of the Benders 
subproblems (i.e., the second-stage problems) to be convex. However, if the 
second-stage problems contain binary or integer variables, their value func- 
tions are in general nonconvex and discontinuous. This then requires a relax- 
ation of the binary restrictions in the second-stage problems and an accom- 
panying convexification process. 

When solving the second-stage problems, a given solution Z is available 
from the first-stage problem and becomes part of the right-hand sides of the 
second-stage problems. Valid inequalities obtained in this context will be in 
the form of 6,y® > 7s. It is helpful to lift these (possibly facetial) cuts to be 
valid in the (a, y®) space, in the form of a,x + Bsy® > 7s + as. By treating 
x as a variable, these cuts can be reused in subsequent Benders iterations 
by updating the first-stage solutions from the Benders master problem, as 
detailed in Sherali and Fraticelli [39]. This cut lifting is not well posed when 
x is continuous. If x is restricted to be binary, then the feasible x-solutions 
are always facial to the convex hull of the two-stage constraint set, and we 
can lift a valid inequality using the disjunction 


Ax >b Ax >b 
Taz + Way? >re > \f ¢ Taat+ Way? >To}. (12.13) 
—2;, >0 aol 


The lifted cut can be obtained via the solution to the following linear program, 
where 2, and ¥; are cut coefficients already obtained in the y* space. 
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Minimize a, (12.14a) 
subject to: as + s9e; — T9A — AoTs > 0 (12.14b) 
As —91e; -%1A— AT; > 0 (12.14c) 
—),W,>—-6, Vk= 0,1 (12.14d) 
—as£+7b+ Aors > Ys (12.14e) 
—€+6,+71b4+ Airs > 7s (12.14f) 
asé > —1 (12.14g) 
T,A,¢ > 0, as unrestricted. (12.14h) 


Also, in the purely binary first-stage problem setting, if additionally the re- 
course matrix is fixed, Sen and Higle [35] propose to lift cuts for second-stage 
problems in the (x, y*)-space, such that the cuts share the same coefficients 
B for the y-variables. These cuts are then in the form of By’ > Ys — Ash. 
This property is named common cut coefficients (C?), and is established as 
follows. 

Given a first-stage solution and a fixed recourse matrix W, consider the 


disjunction 

Wy’ >r;,—Tsz Wy >r,—Tsz 

VV (12.15) 
Yq 2 0 Yq 21 


The common cut coefficients 8 are obtained via the optimal value for 3 
given by the following linear program, where y*, Vs € S, are the current 
second-stage solutions. 


Minimize 5° p,(Sy* — ms) (12.16a) 
subject to: 6 + soég — AoW = 0 (12.16b) 
B—s1eg— AW > 0 (12.16c) 
Ao(rs —-Ts%)-7, >0, Vs~e S$ (12.16d) 
oa +Ai(rs-T.£)-7,>0, VseS (12.16e) 
Spa (69° — m2) > 1, (12.16f) 
w,A,¢ > 0, 6 unrestricted. (12.16g) 


Letting \ be an optimal value for \ in Problem (12.16), the cut 
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By > min{rdors — AoTs2, Airs +61. — AL T52} (12.17) 


is then valid for scenario s, and its piecewise linear concave right-hand side 
can be convexified using its epigraph over the bounded region X = {a € 
R'}'|Az > b}. Denote this epigraph as 


Tg = {(w,2)|z € X,w > R*(a)}, (12.18) 


where R°(x) = min{Agrs — AoTs2, 17s + 1 — A Tt}. 

In this process, for « € X, we can find a lower bound for R*(zx), say, 
£. Each affine function in R*(z) is then translated by CL, if necessary, to 
ensure that w > 0. This will translate the convex hull of the epigraph as 
well. After convexification cuts are obtained, they can be translated back by 
—L£ to recover the original values. So without loss of generality, assume that 
L=0. 

We can then represent J7% as the following disjunction 


w> Nols _ NoT sz aw>mMretG —MTsx 
VV ; (12.19) 
Ax >b Az > b 


The right-hand side of (12.17) can then be convexified based on this disjunc- 
tion by solving the following linear program for each scenario s. 


Minimize o,% + Us — 0s (12.20a) 
subject to: ¢, —ThA—On(AnTs) > 0, Vh=0,1 (12.20b) 
vs -9, >0, Vh=0,1 (12.20c) 
Tob + Oa(Aors) — 6s = 0 (12.20d) 
7b +6,(Airs) + 01% — 6s > 0 (12.20e) 
O+6,=1 (12.20f) 
6,7 >0 o,6,v unrestricted. (12.20g) 


Note that from (12.20c) and (12.20f), we have that v; > 0. For an optimal 


extreme point solution (@5,Us,ds), we then have cleonu(IT) = {(w,x)|x € 
X, w> (6;/05) —(Gs/05)x}. This completes the convexification of the piece- 
wise linear concave R*(x). Letting @, = ¢;/0; and J, = ds/Us, the inequality 


By > Yo — Ger Geese 


is then valid for scenario s, Vs € S, and all scenarios share the same coefficient 
6 for y in the new cuts, which need to be stored only once. 

The disjunctive decomposition (D?) method by Sen and Higle [35] thus 
applies Benders’ decomposition to solve two-stage stochastic programs, using 
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the above C?-scheme embodied by (12.16) and (12.20) to solve the second- 
stage subproblems. 

The two approaches by Carge and Tind [13] and Sen and Higle [35] directly 
stem from disjunctive programming. Using another approach, the reformu- 
lation-linearization technique (RLT) (see Sherali and Adams [37, 38]), we 
can sequentially construct (partial) convex hulls for the second-stage sub- 
problems. Sherali and Fraticelli [39] proposed a method for solving two-stage 
problems having purely binary first-stage variables using a modified Benders’- 
decomposition scheme, where the subproblems are solved by adding RLT cuts 
generated in the (x, y*) space. The idea is to store the cuts as functions of z, 
so that at each Benders iteration, when new x-values are obtained from the 
master (first-stage) problem, the cutting planes obtained previously from the 
second-stage subproblem solutions can be reused in the subsequent subprob- 
lems of the same scenario simply by updating the x-values. 

In a Benders’-decomposition setting, each second-stage problem is solved 
independently; hence, we omit the superscript s for y-variables when no con- 
fusion arises. Furthermore, denote the qth column of matrix W, as W?, and 
the matrix formed by the remaining columns of W, as W,. Correspondingly, 
denote variable y without the qth element as ¥ and let (6 be its associated 
coefficient; that is, by = BG + Boda To derive cuts in the (a, y) space, that is, 
to introduce x as a variable into the cut generation process, we include the 
bounding constraint 0 < x < e into the second stage. By the RLT process, 
given a solution (%,¥), which has y, fractional for some q € Jy, we multiply 
0 < x < e and the constraints in the second stage by the factors y, and 
(1 — y,) to obtain a system in a higher-dimensional space (x,y, z”, 2”), in- 
cluding the new RLT variables z” and z¥ to represent the resulting nonlinear 
products; that is, 

2" = vy, and 29 = Yyg. (12.22) 


Denote the resulting system as 
Taz* + We2¥ > (rs — W2) Yq eh (12.23a) 
—f2* — We! > — Lae Woy rsYq — o (12.23b) 


P,2" > hy —I,2 — Fy, H dp. (12.23c) 
The constraints (12.23a) and (12.23b) are obtained by multiplying (12.1g) 
(for scenario s) by y, and (1 — y,), respectively. The constraints (12.23c) are 
obtained by multiplying the bounding constraints of 0 < x < e by y, and 
(1 — yq), where I’,, I,, and I, are used to denote the resulting coefficient 
matrices for 2”, x, and yg, respectively. Associating the dual multipliers ¢ as 
indicated in (12.23), the solution (Gs, 8s, 7s) of the following linear program 
yields the valid inequality B,y > 7; — Ga for scenario s in the projected 
space of the original variables (x, y). 
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Minimize 6.9 + Bgs¥q + as — Ys (12.24a) 
subject to: (do — é1)Ts + dol, = 0 (12.24b) 
(40 — 61) W, = 0 (12.24c) 
as = iT, + dole (12.24d) 
Bs = dW, (12.24e) 
Bas = Go(WE — 1s) + girs t+ oly (12.24f) 
Ys = G1Ts + doho (12.24¢) 
Beli + Baotg + We — Yo > —1 (12.24n) 
b> 0, as, 8s, Ys unrestricted. (12.241) 


Proposition 12.2. If the technology matrix T and recourse matrix W are 
fixed, and B5§ + BasYq > 7s — Gs is a valid inequality for scenario s obtained 
from (12.24), then 


bsg - Bas'Yq = Ys! — AsL (12.25) 
is valid for scenario s', where Bgs = Bygs t+ Go(Ts—Ts') and Ys" = ¥s+1(Ts"— 
Ta) 


Proof. Follows from the feasibility of (12.24b)—(12.24i). 


Note that the cut (12.25) is valid for scenario s’, but it disturbs the fixed 
recourse structure. Hence, if applied, it should not be included in the cut gen- 
eration process in later iterations, which should continue to use the fixed W. 
However, ultimately, to obtain a convexification of the second-stage problem, 
such cuts would need to be included in the fashion discussed by Sherali and 
Fraticelli [39]. The following proposition provides a way to derive valid in- 
equalities and retain a fixed recourse, in a more general setting of a stochastic 
technology matrix. 


Proposition 12.3. Let the recourse matrix W be fixed, and the technology 
matriz T, and right-hand side r, be stochastic. Denote (Gs, 8s,7s,~) as an 
optimal solution obtained for Problem (12.24). Solve the following linear pro- 
gram corresponding to another scenario s' # s. 


Minimize 0 (12.26a) 
subject to: dT = 1Ty — dole (12.26b) 
dW = 6,W (12.26c) 
(W4 — rer) = bo(W4 — ro) (12.26d) 
¢>0. (12.26e) 


If (12.26) is feasible, then 


Bey > Ye — Gig (12.27a) 
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is a valid inequality for scenario s’, where 
ae =biTs + bole and ys: = dirs + boho. (12.27b) 


Proof. The proof again follows from the feasibility to (12.24b)—(12.24i) for 
scenario s’ by revising ¢9 to @ and retaining ¢, = ¢, and dy, = dp, 
where (12.24e) remains the same, and (12.24d) and (12.24g) are satisfied 
by (12.27b). Furthermore, (12.24b), (12.24c), and (12.24f), are satisfied by 
(12.26b), (12.26c), and (12.26d), respectively, and the normalization con- 


straint (12.24h) is inconsequential. 


The purpose of constraint (12.26d) is to retain the same {, value for all 
new cuts generated, so as to keep the fixed recourse structure if the cuts are 
to be inherited. It can be replaced by other normalization constraints, and a 
valid cut can be generated using the correspondingly computed (as’, Bs’, Ys’) 
values. However, we may no longer have §, = ( in the resulting cut, thereby 
losing the fixed recourse structure. 

In a Benders’-decomposition context where subproblems are solved using a 
cutting plane method, and where these cutting planes are valid in the (2, y)- 
space for conv{(z,y) | T.c2+Wy >rs,0<2<e,y>0,y,; € {0,1}, Vi € A}, 
we can append these cuts along with any possibly added bounding constraints 
for the y-variables to T,x + Wy > rs, and Sherali and Fraticelli [39] have 
shown that Benders cuts generated using the dual solution of the augmented 
LP relaxation system are valid optimality cuts in terms of the first-stage 
variables. 

Using this idea and disjunctive programming, Sen and Sherali [36] devel- 
op a disjunctive decomposition-based branch-and-cut (D?-BAC) approach, 
where subproblems are partially solved using branch-and-cut, and where op- 
timality cuts applied in the first-stage master problem are generated using 
the disjunctive programming concept. In the B&B tree for any subproblem 
in this process, each node is an LP relaxation of the subproblem that in- 
cludes some node-dependent bounding constraints for y. Assuming that all 
the nodes are associated with feasible LP relaxations and are fathomed only 
by the bound computations, then at least one terminal node corresponds 
to an optimal solution. Let Q, denote the set of terminal nodes of the tree 
that have been explored for the subproblem for scenario s, and let zs and 
Zqus denote the lower and upper bounds for y in the nodal subproblem q, 
for q € Qs. Let Ags, Ugis; and Wqus be dual solutions associated with the 
constraint set Wy > r, — Tx and the lower and upper bounding constraints 
for y, respectively, in the nodal subproblem for node q. We then obtain the 
following disjunction: 


n= ge [rs — Isa] + YqlsZqls _ ae eae for at least one gE Qs. (12.28) 


Similar to the convexification process for (12.17), we can use the following 
disjunction to generate a disjunctive cut for (12.28), 
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7) ae Agel st 2 Rast a5 WaisZqls a nee nae 
VV { es: (12.29) 
qEQs 


Denote 7 as a current estimate lower bound for the second-stage value func- 


tion, possibly obtained from previous iterations, and let (@;,0;,0,) be an 
optimal extreme point solution of the following linear program. 


Minimize o,%+ 0,7) — 05 (12.30a) 
subject to: o, — TA — OgAgsTs > 0, Va E Qs (12.30b) 
Us —0g 20, Va Ee Qs (12.30c) 

Td + Oq(AgsTs + WalsZqls — Vquszqus) 5s 20, VWae Qs 
(12.30d) 
+21 (12.30e) 

qEQs 

0,7 >0 o,6,v unrestricted. (12.30f) 


Again, due to (12.30c) and (12.30e), we have that 6, > 0. A disjunctive cut 
that provides a lower bound on the second-stage value function can then be 
obtained in the form of 7, > 7s — @s%, where @, = G,/0; and 7s = Os/Us. 


12.3.2 Pure Continuous and Mized 0-1 First-Stage 
Problems 


Following the cutting plane game concept of Jeroslow [18], the disjunctive 
and RLT cut generation processes finitely solve the second-stage subproblems. 
Therefore, finite convergence of the D? algorithm of Sen and Higle [35], and of 
the modified Benders’-decomposition algorithm of Sherali and Fraticelli [39], 
is afforded by the finite number of feasible first-stage solutions and by the 
finite cutting plane generation processes for solving subproblems. If the first 
stage contains continuous variables, however, a feasible first-stage solution % 
may not be facial with respect to its bounding constraints, and the algorithms 
of Section 12.3.1 would no longer assure convergence. 

To retain the facial property of the first-stage solutions, Ntaimo and 
Sen [27] and Sherali and Zhu [41] propose to build a B&B tree via a par- 
titioning process in the projected space of the bounded first-stage variables 
so as to ultimately induce the aforementioned facial property. 

Using the D? algorithm, when Z is at a vertex of its bounding region, for 
the right-hand side of the cut (12.21) generated by (12.16) and (12.20), we 
will have 


Fs — ou = min{ Agr, — ApT.2, rs + — ArTez}(= R(x); — (12.31) 
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otherwise, this equality may be violated. Ntaimo and Sen [27] then propose a 
finitely convergent D?-based branch-and-cut (D?-CBAC) algorithm for prob- 
lems containing purely continuous first-stage variables. This algorithm con- 
structs a B&B tree defined on the first-stage feasible region, applies the D? 
algorithm at each node, selects some scenario s and iteration k such that 
(12.31) is mostly violated, and partitions based on the disjunction prompted 
by (12.31), enforcing Aor; — AoTs@ > Airs +S, — A1Tsa on one child node 
and Apr, — AoTsu < Airs +5, — A1T-x on the other. The convexification 
cut (12.21) at the parent node is accordingly updated to By > AKr® — ARTF x 
and By > Akr, + sk — APT Fx, at the two child nodes, respectively. 

Because there are finitely many disjunctive variables in the second stage, 
for some iteration of the embedded D? algorithm, there are a finite number of 
right-hand sides of (12.17) that can be constructed; hence, there are finitely 
many partitions of the first-stage feasible region to consider, which leads to 
finite convergence of the D?-CBAC algorithm. 

If the first stage further contains both continuous and binary variables, 
Sherali and Zhu [41] propose a decomposition-based B&B (DBAB) algorithm 
that is guaranteed to converge to an optimal solution. They assume relative 
complete recourse with respect to some bounding hyperrectangle of «. The 
branch-and-bound tree is again defined on the projected space of the bounded 
first-stage variables, where lower bounds for the nodal problems are computed 
by applying a modified Benders’ method extended from Sherali and Fraticel- 
li [39], but defined on a subdivision of the original bounding hyperrectangle 
for x, and the Benders subproblems are derived based on partial convex hull 
representations in the (#,y*)-spaces using the second-stage constraints and 
the current bounding constraints for z. For some given feasible first-stage 
solution Z, because x = Z may not be facial with respect to its bounding 
constraints, the Benders subproblems are shown to define lower bounds for 
second-stage value functions. Therefore, any resulting Benders master prob- 
lem provides a lower bound for the original stochastic program defined over 
the same hyperrectangle, and yields the same objective value if Z is a vertex 
of the defining region. 

In the branch-and-bound process of the DBAB algorithm, a node yielding 
the least lower bound is selected for branching at each partitioning step. 
Hence, the nodal objective value provides a lower bound for the original two- 
stage problem. In the partitioning step, the first-stage continuous and binary 
variables are dealt with differently. A variable x, whose current value is most 
in-between its bounds is selected as the partitioning variable. If p € Iy, then 
Lp is fixed at 0 and 1 in the two child nodes, respectively; otherwise, xp is a 
continuous variable, and its current value Z, (or the midpoint of the current 
bounding interval for ,) is used as the lower and upper bounds for the two 
child nodes, respectively. Therefore, barring finite convergence, along any 
infinite branch of the branch-and-bound tree, there will exist a subsequence 
of the selected nodes such that the bounding interval for some continuous 
variable x, is partitioned infinitely many times and such that the limiting 
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value Z, coincides with one of its bounds. As x» was selected whenever its 
value was most in-between its bounds, in the limit, all Z;, V2 = 1,...,n41 
would coincide with one of their bounds, and hence, Z would be a vertex of 
the limiting bounding hyperrectangle, thereby providing an upper bounding 
solution for the original two-stage stochastic program. Together with the node 
selection rule, the partitioning process therefore guarantees convergence of 
the algorithm to an optimal solution. 

A difficulty in directly implementing the modified Benders’ method by 
Sherali and Fraticelli [39] at the nodal problems arises due to the fact that 
when Z is not extremal with respect to its bounds, the solution y obtained 
for a Benders subproblem defined on a partial convex hull representation in 
the (x, y)-space may not satisfy its binary restrictions. We thus need to be 
able to detect whether a Benders subproblem is already solved by such a 
fractional 7. This can be achieved as follows. If y; € {0,1}, Vj € Jo, or if 
(Z,y) can be represented as a convex combination of some extreme points of 
the current partial convex hull defining the Benders subproblem such that 
these extreme points have binary y,;-variables for all 7 € Jp, then ¥ solves the 
Benders subproblem. Sherali and Zhu [41] describe a procedure to check this 
situation in their overall algorithmic scheme. 

The RLT cuts generated for any given subhyperrectangle are reusable by 
updating the z-values in subsequent Benders iterations at the same node, 
and are also inheritable by the subproblems of the child nodes. Likewise, the 
Benders cuts derived for a given subhyperrectangle can also be inherited by 
the lower bounding master programs solved for its child nodes. 


12.3.3 Connections Between Disjunctive Cuts and 
RLT Cuts 


In this section, we demonstrate the connections between the two types of con- 
vexification cuts, namely, the disjunctive cuts generated by Problems (12.16) 
and (12.20) (or the counterpart of problem (12.9) in a stochastic setting), 
and the RLT cuts generated by Problem (12.24) under fixed recourse. We 
discuss these cuts in the context of fixed recourse stochastic programs hav- 
ing continuous first-stage variables, and the analysis is similar for the case of 
discrete first-stage variables. 

Let x be continuous and bounded by | < x < u at some node of the B&B 
process performed on the projected x-space as described in Section 12.3.2. 
Upon multiplying the constraints 


Tc+Wy+W%y,>r, and I<a<u (12.32) 


by yg and (1—y,), linearizing upon substituting the resulting nonlinear terms 
using (12.22), the higher-dimensional system (12.23) obtained is as follows. 
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T.2° +W2 > (rs — W*)yq <— oo 
—T3z —W2 >r.—T.t —Wi — req — o1 
aS lyq = bxl0 
— 2° > —UYg — bxud 
ae > —lyyti-& — Orit 
2° > W_¢ -Ute2 = deul- 
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(12.33a 
(12.33b 
(12.33¢ 
(12.33d 
12.33¢ 


) 
) 
) 
) 
) 
12.33f) 


( 
( 


The cut generation problem as a counterpart to (12.24) then takes the 


following form. 


Minimize 


subject to: (do — ¢1)7s + bzio — bruo — dati + drui = 0 


bsg + Bast + Asi — Ys 


(¢0 — ¢1)W =0 
As = oils = galt _ Pxul 
Bs = dW 


Bas = Ys + Go(W4 — rs) — daiol + Gano 
Ys = b17s + bail — bauit 

Bs¥ + Bastig + eB — Yo > —1 

6>0, Os, Bs; Bgs, Ys unrestricted, 


where (Z,y) is the current solution having ¥, q € Jo, fractional. Note that 
the resulting cut is of the form 


Get + Boj + BasYq > Tes 


where (Gs, Bs, Boss Ys, @) solves Problem (12.34). 


Proposition 12.4. Problem (12.34) is equivalent to 


Minimize Bs + Bqs%q + st — Ys 
subject to: As = gols Tr Pxl0 = Pru 


as > O1Ts + bet — Grul 

Bs > bo0W 

Bs > dW 

Bas = Ys + b0(W4 — rs) — beiol + beuott 
Ys < girs t+ ben! — Pru 

Boi + Basitq + Os — Ys > —1 

o> 0, Os, Be; Bass Ys unrestricted. 


(12.35) 


12.36a 
12.36b 
12.36c 


12.36d 


12.36e 
12.36f 
12.36g 


12.36h 
(12.36i 


ae ae ae ee 


—~ 
Nes NS ONS Ss aes a as tae 
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Proof. Noting the objective coefficients of 8,, and ys in Problem (12.34), we 
can equivalently write the equalities of (12.34f) and (12.34g) as the inequal- 
ities (12.36f) and (12.36g), respectively, as these constraints will always hold 
as equalities in any optimal solution. Substituting for the terms involving 
oi in (12.34d) and (12.34e) into (12.34b) and (12.34c), respectively, yields 
as = boTs + ozin — Oruo and Bs = doW, that is, (12.36b) and (12.36d) in 
equality forms. 

To complete the proof, we now show that at an optimal solution to (12.36), 
constraints (12.36b)—(12.36e) will hold as equalities as in (12.34b)—(12.34e). 
Suppose that, on the contrary, we have an optimal solution 


(4, 8, Bass 4s brit; bxi0, d-); 


where $_ represent the vector ¢ except for the elements $,;, and dat0; such 
that 


Gsi = boT si + bxi0i — bauoi > G1Tsi + Getti — Gault, OF (12.37a) 
Asi = biTsi a briti _ bouti = boT si oH bri0i _ boudi (12.37b) 


for some 7 € I, where T,,; is the ith column of T,. 
If (12.37a) occurs, then let de; = [$oT si + $a10i — Pauoi] — [b1Tsi + Gerri — 
bouts] > 0. Obtain a new solution (4, B, Boas Yor Cutis dbxt0, 6—) such that 


Chis = Fait bei, Png =bonj. VI #4, 
Vs = ac a lies; and Bas = Bos TF lidbei- 


This new solution is feasible, satisfies ds; = d0Tsi + bx10i — bruoi = ¢1T si + 
1.4; — beuri, and reduces the objective value by @ili(1— Yq) > 0. Hence, if 
(12.37a) holds, (4, 6, Bes 4, buti, Pxt0s b_) cannot be an optimal solution. On 
the other hand, suppose that (12.37b) occurs. Then let é.; = [b1T 55 + Cvins = 
Pouril [boT ei 4 Orie = bauil > 0. Similarly, we can obtain a new solution 
(0B Beer ss nti, Pigg —) Such that 


10; = bxt0i + ei, Pio; = bxtoj, VJ #4, and 8, = Bgs — lide. 


Again, this new solution is feasible, satisfies @.; = $1Tsi + deni — bauli = 
dots + $1105 — Decals and reduces the objective value by ¢,ilijq > 0. Hence, 
if (12.37b) holds, then (4, B, Bg, 4, batt; b_) cannot be an optimal solution, 
either. Therefore, we always have (12.36b) and (12.36c) tight at an optimal 
solution. Similarly, (12.36d) and (12.36e) always hold as equalities in an op- 
timal solution. 
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Now, consider the disjunctive cut generation where Ax > 6b contains only 
the bounding constraints 1 < 2 < u for x. Applying yj € {0,1} in the 
system (12.32), we obtain the following disjunction for scenario s. 


on aes Ts + wy 21s Tx - wy 2 Ts — w4 = Po 
but Jx>l \/ r>l bh ai 
oul ma -L£>—-U —L.2 -U “— xud 
d— (=x > 0 ye > an 
(12.38) 


Using the associated multipliers, a disjunctive cut can be generated by 
solving the following problem. 


Minimize 6,9 + Bgs¥q + ast — 1s (12.39a) 
subject to: constraints (12.36b)—(12.36e), (12.36h), and (12.39b) 
Bas = —1 (12.39c) 
Bqs 2 o (12.39d) 
Ys S bo(rs — W*) + dxiol — Peu0¥ + Ao (12.39e) 
Ys S bits + Gaul — Pout (12.39f) 
b6,A>0, as, Bs Baws e unrestricted. (12.39¢) 


Because $j; appears only in constraints (12.39c) and (12.39d), we will have 
Pas = 0 > —A1 in an optimal solution. Constraints (12.39c)—(12.39e) then 
directly reduce to (12.36f), and Problem (12.39) is exactly the same as Prob- 
lem (12.36). Therefore, the cut (12.35) generated using a RLT process is 
indeed also a disjunctive cut. 

Applying the C® result from Sen and Higle [35], we can then generate 
disjunctive cuts that are valid for the disjunctions (12.38) for all scenarios, 
so that the cuts contain a common coefficient 6 for y. Similar to (12.16), the 
cut coefficients can be obtained by collecting the coefficient matrices of all 
scenarios and solving the following problem. 


Minimize 5° p.(89° + Bgif + ast — Ys) (12.40a) 
subject to: as > énTs + bzins — Peuns, VR =0,1, Vs ES (12.40b) 
B>¢,.W, Vh=0,1 (12.40c) 
Ba 2 Ys =F bo(W4 = Ts) = dxlosl + drudst; Vs € S (12.40d) 
Ys S Oils =F Pzlisl = drulst; VsES (12.40e) 
S- pa(BG* + Bolif + 2% — Ye) > -1 (12.408) 


s 


o>0, As, B, Bas Ys unrestricted. (12.40g) 
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For an optimal solution (B, By Gs; Vs, V8), By + Bays > 7s —As0, VS ES 
are then valid inequalities for disjunctions (12.38) for all scenarios s € S', and 
they share the same coefficient 8 = (8, By) for y. 

We close this section by emphasizing that the efficacy of cutting plane 
methods depends on problem structures. They are most efficient when tight 
representations are available using relatively few cuts. Ntaimo and Sen [26] 
have successfully applied the disjunctive decomposition (D?) method (([35]) 
on stochastic server location problems having as many as 1,000,010 binary 
variables and 120,010 constraints, using a Sun 280R with UltraSPARC-HI+ 
cpus running at 900 MHz. For most of the nontrivial problems, the bench- 
marking commercial MIP software package CPLEX 7.0 failed to solve the 
problems. Another reason for this method to be so effective is due to the cut 
coefficient sharing property. As we have mentioned, in practice, cut coefficient 
sharing and reuse are important in saving computation, and greatly enhance 
the efficacy of cutting plane methods. 


12.4 Structural Enumeration Using a Fixed Technology 
Matrix 


In this section, we assume the following. 
Al. The first-stage feasible region X is bounded, and thus compact. 


A2. For any x € X, the second-stage problems are feasible (relative complete 
recourse). 


A3. For any z € X, the second-stage value functions are bounded. 
AA. The second-stage variables are purely integer; that is, y € Z")?. 
A5. The technology matrix T is fixed (i.e., deterministic). 


A6. All elements in the recourse matrices W, are integral. (Rational elements 
can be scaled to obtain integral elements.) 


Assumption Al assures finite termination for the enumeration schemes 
used in the algorithms in this section. Assumptions A2 and A3 imply that 
fs(x) and u(rs — Tx), Vu € R%, are finite. Using Assumption A5, we can 
transform the second-stage value functions as defined on the space of the 
so-called “tender variables,” —T'x, which we denote as y. That is, 


F(x) = min{gsy | Wey >rst+x,y € Z47} = fs(z). (12.41) 


From Assumptions A4 and A6, we have that W,y > t implies W.y > [#], 
where [-| and |-|, respectively, denote the componentwise rounded-up and 
rounded-down forms of a vector. Therefore, we have that f(a) is constant 
on the sets 
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{x ERY: [rs -—Tze] =«}={e:0n-rs—e<—-Tr<w—rs}, Vee Z™, 


(12.42a) 
and F’,(x) is constant on the hyperrectangles 
m2 
[[(@i-taj Ley —rejl, 9 Ve eZ. (12.42b) 
j=l 
Therefore, for some integral vector k = (ki1,...,ksj,--- ities)? € Z)Slma 


the expectation Q(x) = >>, psfs(w) is constant over the intersection of the 
sets (12.42a): 


C(k) = () ( lw: hej — Pej —1 < —Tya < hay —1oj}, 9 VE ZISim 
s€S j=l (12.43a) 


and Q(x) = >>, psFs(x) is constant over the intersection of the hyperrectan- 
gles (12.42b): 


Ck) = 1) [[ ep — 053 — 1g reg], «= VRE ZISI™2, (12.43) 
s€S j=1 


Based on the above observation that function values are constant over 
regions, Schultz et al. [32] and Ahmed et al. [1] developed algorithms that 
construct partitions of the first-stage feasible region using C(-) and C(-), 
respectively, and evaluate expected values, Q(-) and Q(-), on these partitions. 
The algorithm in the former paper operates over the x-space, whereas that 
in the latter works in the y-space. 

Under Assumption Al that the first-stage feasible region X is compact, 
Schultz et al. [32] show that there are a finite number of vertices of the sets 
C(-) 0X, and these vertices contain an optimal solution for x. (If X is not 
compact, this vertex set is only countable, and a level set is used to bound the 
feasible region to guarantee finite termination.) Aside from A1—A6, Schultz 
et al. [32] further assume the following. 


A7. The recourse matrix W is fixed. 
A8. The first-stage variables are continuous; that is, J, = ling = D. 


The purpose of these assumptions is to use the Grdbner-basis method from 
computational algebra to evaluate the second-stage value functions. Although 
the computation of Grébner bases is expensive, it only uses the T and W 
matrices, and does not rely on the right-hand-side values. Hence, for fixed 
T and W, Grébner bases need to be computed only once, and then for each 
different right-hand-side value, the second-stage function evaluation, which 
reduces to a single generalized division, is very easy. Using this idea, the 
objective value cz + Q(x) is evaluated at each candidate point from the finite 
vertex set using Grébner bases, and after enumerating all these candidate 
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points, the one that yields the lowest function value is an optimal solution to 
the problem. Improved versions of this algorithm are proposed to reduce the 
number of candidate points that need to be evaluated. 

The explicit enumeration approach is still quite expensive for nontrivial 
problems. Using a systematic enumeration scheme such as B&B to evaluate 
the partitions is more practical. However, in the x-space, the discontinuities 
between these partitions are not orthogonal to the (a-) variable axes, thus 
branching on z-variables would result in discontinuities in the interior of the 
hyperrectangles defining the nodal problems. To circumvent this difficulty, 
Ahmed et al. [1] propose to transform the problem into the space of the 
tender variable vector y: 


TP: min{cx + DoF ol 5(x) | a eX, Tx = x, and (12.41)}, (12.44) 


so that the partitioned hyperrectangles are of the form (12.43b), having con- 
stant second-stage function values, and the discontinuities between these hy- 
perrectangles are orthogonal to the (y-) variable axes. They show that if x* 
is an optimal solution to (12.44), then a* € argmin{cx | a € X, Tx = y*} is 
an optimal solution of (12.1). 

Their B&B algorithm can handle general first-stage variables and scenario- 
dependent recourse. Hence, Assumptions A7 and A8 are not needed. For 
each hyperrectangular partition P* = Wy: (Fu uf] in the form of (12.42b), 
because F,(y) is lower semicontinuous and nondecreasing, we can obtain a 
lower bound on the two-stage problem defined on P* by solving the following 
problem. 


Minimize cx + (12.45a) 
subject tox € X, Tr =x (12.45b) 
F<y<u (12.45c) 
n> > psF.(I* + €) (12.45d) 

ses 


where F’,(-) is as defined in (12.41), and « is calculated a priori as a sufficiently 
small number such that F,(-) is constant over (J*,/* + €]. The value for e is 
decided as follows. Along each axis j, Vj = 1,...,mg, within the unit interval 
(k1j —11; —1,k1; —11;| for some ky; € Z, the candidate point of discontinuity 
for each scenario s is identified as |[ki; — ri; + Tsj| —1Tsj, Vs € S. These 
points of discontinuity repeat the same pattern to the right of kj; —11,; with 
a unit period. It is then sufficient to sort these points, and obtain the smallest 
interval as €; along axis j. The final e-value is chosen as strictly smaller than 
each €;. 

At the node selection step, the partition that yields the least lower bound is 
selected for further branching. At the branching step, the branching variable 
x,’ is selected such that yj + 1.,’ is an integer greater than 1, and Q(x) is 
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discontinuous at x; for some scenario s. Let y° be the solution obtained from 
(12.45) for scenario s. For each 7 = 1,..., mz, let the pj = minses{(Wsy*); — 
rs;}, and let the variable vy," be chosen such that p; lies most in-between its 
bounds Jf, and u‘,. Partition P* is then branched along axis j’ at the value 
p;'. Using such a B&B scheme, the algorithm therefore avoids an exhaustive 
enumeration of the partitions, yet guarantees finite convergence. 

This algorithm is applied by Ahmed and Garcia [2] to a dynamic acqui- 
sition and assignment problem. Problems up to the size of 24,027 variables 
(including 24,009 binary variables), 10,518 constraints, and 46,545 nonzeros 
were randomly generated to test the efficacy of the algorithm. On a Sun Sparc 
Ultra60 workstation, 85% of the problems were solved within the specified 
tolerance. The remaining prematurely terminated problems reached an aver- 
age gap of 2.05% for those due to time limitation and 1.10% for those due 
to memory or node limitation. Using CPLEX 7.0, however, only 27% of the 
problems were solved. 

Similar to Schultz et al. [32], Kong et al. [22] examine integer programs that 
differ only in their right-hand sides. In lieu of Grébner bases, their algorithms 
use stored value functions of parameterized integer programs. In addition to 
Assumptions Al—A7, they further assume the following. 


A9. The second-stage objective coefficient vector g is fixed. 
A10. The first-stage variables are purely integer-restricted; that is, x € Z')’. 
All. All elements in A, T, 6, and r;, Vs € S, are integral. 


The first- and second-stage problems now both belong to the type of pure 
integer programs that have integral coefficient matrices and right-hand-side 
vectors. In general, given a coefficient matrix G € Z'’*", this type of integer 
program can be expressed as 


(VF): 2(¢) =min{dz|Gr>¢,ceZi}, for¢ eZ”, (12.46) 


where z(-) : Z™ +> Z is its value function. z(-) is nondecreasing over Z™ 
and subadditive over D = {¢ € Z™ | {a € Z% | Gx > ¢} # O}. That 
is, for G1, G2 € D, G, + G © D implies that 2(¢,) + 2(d2) > 2(¢, + G2). 
(The authors use the superadditivity property for maximization problems. We 
chose to use subadditivity for minimization problems to maintain consistency 
in exposition.) 

Using these properties, Kong et al. [22] develop an integer programming- 
based algorithm and a dynamic programming-based algorithm that compute 
the value of z(¢) for each ¢ in a finite set Y. In the integer programming-based 
algorithm, at each iteration k, some C* € Y is selected, and the corresponding 
integer program min{dz | Gx > ¢*,x € Z™} is solved to obtain a solution 
&*. Based on the value of #*, other ¢ € Y are then selected to have their 
lower bounds, /(¢), and upper bounds, u(¢), updated using the nondecreasing 
and subadditivity properties of z(¢), until 1(¢) = u(¢), for all ¢ € Y, at 
which point, z(¢) = U(¢) = u(¢) are available for all ¢ € Y. The dynamic 
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programming-based approach does not solve any integer program, but is only 
applicable when G' is nonpositive. Denote Y; to be the set of ¢ € Y such that 
¢ < G;. This algorithm uses the initial condition 2(¢) = 0, V¢ € Y\US_4 VY, 
and the recursive function z(¢) = min{d; + 2(¢+G;):G,; © Yj =1,...,n}, 
VOE Uja1 Y;. Naturally, both algorithms work better when the size of TY is 
small. 

To use these value function evaluation algorithms to solve two-stage 
SMIPs, denote the first-stage value function as 


ai((1) = min{er | Ar > 6, Tr > G,e€ 2%}, 


VW, € V1 ={G €Z™ | =Tz for some x € X}, 


and denote the second-stage value function as 
2o(¢2) = min{gy |Wy >, ye ZY}, 


Vaer= (J Uf{re-G}. 


EY! ses 


These function values for the first- and second-stage problems are stored for 
all possible right-hand-side values in Y! and Y?, respectively. (Due to the 
storage requirement, this approach is more suitable for problems having a 
relatively small number of rows.) After storing these value function responses, 
the feasible region for y is systematically searched to obtain an optimal value 
x*. A finitely convergent B&B search scheme is proposed. At node k defined 
on a hyperrectangle [I*,u*], a lower bound LB* and an upper bound UB* 
are computed as 


LBP = 2 (1*) + S\ psza(rs —u*), 


and 
UB* = z(u*)+ >— peaa(re — U*). 


Branching is performed by partitioning the current hyperrectangle by bisect- 
ing some selected axis, which ensures convergence. 


12.5 Conclusion 


In this chapter, we have reviewed some recent advances in solving two-stage 
stochastic (mixed-) integer programs, and have provided some insights and 
results that exhibit certain interconnections between the methods. Due to the 
dual angular structure, these solution approaches apply various versions of 
decomposition, branch-and-bound, and convexification techniques. We have 
studied these methods from the viewpoint of the adopted decomposition 
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Table 12.1 Literature on solving stochastic mixed-integer programs 


First Stage (x) Second Stage (y) Literature Approach Section 
Binary Continuous 23 L-shaped 12.2.1 
3] Branch-and-Fix 12.2.2 
Continuous Integer 32 Grébner basis 12.4 
Integer Integer 17 Test-set decomposition 12.2.3 
22 Value function 12.4 
Mixed-integer Integer 1] x-transformation 12.4 
14 Dual pricing 12.2.1 
Continuous 0-1 mixed-integer [13 Disj. prog. 12.3.1 
27 Disj. prog. 12.3.2 
Binary 0-1 mixed-integer [39 Benders & RLT 12.3.1 
36 Disj. prog. 12.3.1 
35 Disj. prog. 12.3.1 
0-1 mixed-integer O-1 mixed-integer [41 Convexification 12.3.2 
Mixed-integer Mixed-integer 12 Lagrangian dual 12.2.2 
33 Lagrangian dual 12.2.2 


framework, the convexification approach used in solving problems having 
integer recourse, and the enumeration scheme employed when the technology 
matrix is deterministic. 

Table 12.1 lists the literature we have covered, grouped by the type of 
variables appearing in the two stages, which is most relevant to the algo- 
rithmic developments. Many of these methods are theoretically extendable 
to multistage SMIPs. However, scalability becomes a major issue here, which 
requires further research. For an introduction on multistage stochastic integer 
programs, we refer the reader to Romisch and Schultz [28]. 
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Chapter 13 


Dualistic Riemannian Manifold 
Structure Induced from Convex 
Functions 


Jun Zhang and Hiroshi Matsuzoe 


Key words: Legendre—Fenchel duality, biorthogonal coordinates, Rieman- 
nian metric, conjugate connections, equiaffine geometry, parallel volume form, 
affine immersion, Hessian geometry 


13.1 Introduction 


Convex analysis has wide applications in science and engineering, such as 
mechanics, optimization and control, theoretical statistics, mathematical eco- 
nomics and game theory, and so on. It offers an analytic framework to treat 
systems and phenomena that depart from linearity, based on an elegant math- 
ematical characterization of the notion of “duality” (Rockafellar, 1970, 1974, 
Ekeland and Temam, 1976). Recent work of David Gao (2000) further pro- 
vided a comprehensive and unified treatment of duality principles in con- 
vex and nonconvex systems, greatly enriching the theoretical foundation and 
scope of applications. 

Central to convex analysis is the Legendre-Fenchel transform, and duality 
between two sets of variables defined on a pair of vector spaces that are dual 
with respect to each other. When the convex functions involved are smooth, 
these variables are in one-to-one correspondence; they can actually be viewed 
as two coordinate systems on a certain Riemannian manifold. This is the 
viewpoint from the so-called information geometry (Amari, 1985, Amari and 
Nagaoka, 2000), and it is investigated at great length in this chapter. 


Jun Zhang - Department of Psychology, University of Michigan, Ann Arbor, MI 48109, 
U.S.A., e-mail: junz@umich.edu 


Hiroshi Matsuzoe - Department of Computer Science and Engineering, Nagoya Institute 
of Technology, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan 
e-mail: matsuzoe@nitech.ac.jp 


D.Y. Gao, H.D. Sherali, (eds.), Advances in Applied Mathematics and Global Optimization 437 
Advances in Mechanics and Mathematics 17, DOI 10.1007/978-0-387-75714-8 13, 
© Springer Science+Business Media, LLC 2009 


438 J. Zhang, H. Matsuzoe 


The link between convex functions and Riemannian geometry is shown to 
be severalfold. First, the pair of convex functions conjugate to one another 
are the potential functions that induce the Riemannian metric. Second, the 
two sets of variables are special coordinate systems of the manifold in that 
they are “biorthogonal;” that is, the Jacobian of coordinate transformation 
between them is precisely the Riemannian metric. It turns out that biorthog- 
onal coordinates are global coordinates for a pair of dually flat connections 
on the Riemannian manifold. Third, the Fenchel inequality provides a natu- 
ral way to construct directed (“pseudo-”) distance over the convex point set; 
this is the Bregman divergence (a.k.a. canonical divergence), which gives rise 
to the dually flat connections. Finally, the geometric structure (Riemannian 
metric, conjugate/dual connections) can be induced from graph immersions 
of a convex function into a higher-dimensional affine space. 

Our goal in this chapter is to review such a geometric view of convex 
functions and the associated conjugacy/duality, as well as provide some new 
results. We review the background of convex analysis and Riemannian ge- 
ometry (and affine hypersurface theory) in Section 13.2, with attention to 
the well-established relation between biorthogonal coordinates and dually 
flat (also called “Hessian”) manifolds. In Section 13.3, we develop the full- 
fledged a-Hessian geometry, which extends the dually flat Hessian manifold 
(a = +1), and give an example from theoretical statistics when such geome- 
try arises; this parallels the generalization of the convex-induced divergence 
function with arbitrary a (Zhang, 2004) from Bregman divergence (a = +1). 
To close, we give a summary and discuss some open problems in Section 13.4. 


13.2 Convex Functions and Riemannian Geometry 


13.2.1 Convex Functions and the Associated 
Divergence Functions 


A strictly convex (or simply “convex”) function ®: V C R" > R,x + @(x) 
is defined by 


l-a l+a 
@ 


1- 1 
(y) o( ~ e+ +*,) >0 (13.1) 
for all « 4 y for any |a| < 1 (the inequality sign is reversed when |a| > 1). In 
this chapter, V (and V below) identifies a subset of R” both as a point set 
and as a vector space. We assume © to be sufficiently smooth (differentiable 
up to fourth order). Define 


Bo(x,y) = P(x) — Py) — (a — y, OP(y)), (13.2) 
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where 0 = [0),...,0n®] with 0; = 0/0x' denotes the gradient valued 
in the co-vector space V C R”, and (-,-),, denotes the canonical pairing of a 


point /vector x = [x!,...,2”"] € V and a point /co-vector u = [ui,..-,Un] € V 
(dual to V): 
(L,U)n = So atu. (13.3) 
i=1 


(Where there is no danger of confusion, the subscript n in (-,-), is often 
omitted.) A basic fact in convex analysis is that the necessary and sufficient 
condition for a smooth function ® to be convex is 


Ba(z,y) > 0 (13.4) 


for « # y. We remark that Bs is sometimes called “Bregman divergence” 
(Bregman, 1967), widely used in convex optimization literature (Della Pietra 
et al., 2002, Bauschke, 2003, Bauschke and Combettes, 2003, Bauschke et al., 
2003). 

Zhang (2004) introduced the following family of functions on V x V as 
indexed by a € R, 


o 4 l-a l+a l-a l+a 
DY (ey) = ay (GP oa) + S80 - 0 (S224 S*y)). 
(13.5) 


Here DEY (a, y) is defined by taking limg.+1: 


DY (a,y) = DEY, 2) = Balz,y), 
DEY (a,y) = DY (y,2) = Baly,2). 


Note that De) (x,y) satisfies the relation (called “referential duality” in 
Zhang, 2006a) 


Dp (e,y) = Dy (y, 2); 
that is, exchanging the asymmetric status of the two points (in the directed 
distance) amounts to a @ —a. 


From its construction, DO) (x, y) is nonnegative for ja] < 1 due to equa- 
tion (13.1), and for |a| = 1 due to equation (13.4). For |a| > 1, assuming 


(((1 — a) /2) a + ((1+ a) /2) y) € V, the nonnegativity of De) (x,y) can also 
be proven due to the inequality (13.1) reversing its sign. Therefore, we have 


Lemma 13.1. For a smooth function 6: V CR" — R, the following condi- 
tions are equivalent (for x,y €V). 


(i) ® is strictly convex. 
(ii) Dy) (x,y) > 0. 
(iii) DE (w,y) > 0. 
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(iv) DO) (x,y) > 0 for all a] <1. 
(v) DO) (x,y) > 0 for all |a| > 1. 


Recall that, when @ is convex, its convex conjugate @:V CR" > Ris 
defined through the Legendre—Fenchel transform: 


P(u) = ((86)~"(u),u) — B((AG)~*(u)), (13.6) 


with & = & and (0) = (96)~!. The function @ is also convex, and through 
which (13.4) precisely expresses the Fenchel inequality 


P(x) + O(u) — (x, u) > 0 
for any re V, ue V, with equality holding if and only if 
u = (06)(x) = (86)! (x) —> x = (08)(u) = (08)""(u), (13.7) 


or, in component form, 


a ne =e (13.8) 


With the aid of conjugate variables, we can introduce the “canonical di- 
vergence” Ag: V x V > R, (and Ag: V x V > R4) where Ry = Rt U {0} 


Aa(x,v) = O(2) + G(v) — (x, v) = Az(v, 2). 
They are related to the Bregman divergence (13.2) via 
Bo(x, (88) *(v)) = Ao(a,v) = Bg ((08)(c),v). 


Bregman (or canonical) divergence! provides a measure of directed distance 
between two points; that is, it is nonnegative for all values of x,y € V, and 
vanishes only when x = y. More formally, a divergence function D: V x V — 
R, is a smooth function (differentiable up to third order) that satisfies 


(i) D(z, y) > 0 Vz,y € V with equality holding if and only if x = y, 
(i) A: D(2,y)|y = yD.) = O. 
(iii) 0,¢0y7D(z, y)| , 18 negative definite. 


z= 


Here @,: denotes partial derivative with respect to the ith component of the 
x-variable only.” 


1 The divergence function, also called the “contrast function,” is a terminology arising out 
of the theoretical statistics literature. It has nothing to do with the divergence operation 
in vector calculus. 

? The reader should not confuse the shorthand notations 0; with 0,: (or O,i): the former 
operates on a function defined on V such as ®: «++ &(x) € R, whereas the latter operates 
on a function defined on V x V such as D: (x,y) ++ D(a, y) € Ry. 
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13.2.2 Differentiable Manifold: Metric and Connection 
Structures 


A differentiable manifold 0 is a space that locally “looks like” a Euclidean 
space R”. By “looks like,” we mean that for any base (reference) point p € 
MM, there exists a bijective mapping (“coordinate functions”) between the 
neighborhood of p (i.e., a patch of the manifold) and a subset V of R”. By 
locally, we mean that various such mappings must be smoothly related to one 
another (if they are centered at the same reference point) or consistently glued 
together (if they are centered at different reference points) and globally cover 
the entire manifold. Below, we assume that a coordinate system is chosen 
such that each point is indexed by x € V, with the origin as the reference 
point. 

A manifold is specified with certain structures. First, there is an inner- 
product structure associated with tangent spaces of the manifold. This is 
given by the metric tensor field g which is, when evaluated at each location x 
(omitted in our notation), a symmetric bilinear form g(-,-) of tangent vectors 
X,Y € T,(Mt) ~ R” such that g(X,X) is always positive for all nonzero 
vectors X. In local coordinates with bases 0; = 0/Oz’, i = 1,...,n (ie, 
X,Y are expressed as X = >, X'0;, Y = 0, Y'0;), the components of g are 
denoted as 

9ij(@) = g(9;, 05). (13.9) 


The metric tensor allows us to define distance on a manifold as the shortest 
curve (called “geodesic” ) connecting two points. It also allows the measure- 
ment of angles and hence defines orthogonality of a vector to a submanifold. 
Projections of vectors to a lower-dimensional submanifold become possible 
once a metric is given. 

Second, there is a structure associated with the notion of “parallelism” of 
vector fields on a manifold. This is given by the affine (linear) connection (or 
simply “connection” ) V, mapping two vector fields X and Y to a third one 
denoted by Vy X: (X,Y) > VyX. Intuitively, it represents the “intrinsic” 
difference of the vector field X from its value at point x and its value at a 
nearby point connected to x (in the direction given by Y). Here “intrinsic” 
means that vector comparison at two neighboring locations of the manifold is 
through a process called “parallel transport,” whereby a vector’s components 
are adjusted as it moves across points on the base manifold. Under the local 
coordinate system with bases 0; = 0/0x*, components of V can be written 
out in “contravariant” form denoted I’ 1 (which is a collection of n? functions 
of x), 


Va,0; = ‘> Di, Ob. (13.10) 
l 


Under coordinate transform x +> £, the new set of functions I are related to 
old ones I’ via 
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i Ded 2k =| 


uu ; a Oz™ Ox" OLZ™Oz” | Oxk 


A curve whose tangent vectors are intrinsically parallel along it is called an 
“auto-parallel curve.” 

As a primitive on a manifold, affine connections can be characterized in 
terms of their torsion and curvature. The torsion T' of a connection I’, which 
is a tensor itself, is aps by the asymmetric part of the connection T(0;, 0;) = 
V0,0; — Va,0i = >, Kok, where Tf is its local representation? given as 


Th(2) = Tk (e) — Di(a). 


The curviness/flatness of a connection I" is described by the curvature tensor 
R, defined as 
R(O;, 0;)On = (Va, Va, — Va; Va;)Or- 


Writing R(0;,0;)O% =>", Rei; O; and substituting (13.10), the components of 


the curvature tensor are* 
Ori.(@) ark (a oe 
Rij (2) = aa orl) + Pim (x) — ya) ik (2). 


By definition, Ri 


flat when Rag (@ x) = 0. Note that this is a tensorial condition, so that the 
flatness of a connection V is a coordinate-independent property even though 
the local expression of the connection (in terms of I’) is highly coordinate- 
dependent. For any flat connection, there exists a local coordinate system 
under which P(x) = 0 in a neighborhood; this is the affine coordinate for a 
flat connection. 

In the above discussions, metric and connections are treated as inducing 
separate structures on a manifold. On a manifold where both are defined, 
then it is convenient to express a connection J’ in its “covariant” form 


;ij 18 antisymmetric when 7 «> j. A connection is said to be 


Tij,e = 9(Va,0},O%) = >> gel iy. (13.12) 
I 


Although [7% Kis the more primitive quantity that does not involve the metric, 
Lijk ere the projection of an intrinsically differentiated vector field 
onto the manifold spanned by the bases 0;. The covariant form of the curva- 


3 Here and below, we restrict to holonomic coordinate systems in R” only, where all 
coordinate bases commute [0;,0;] = 0 for 1 # j. 

4 This componentwise notation of curvature tensor here follows standard differential ge- 
ometry textbooks, such as Nomizu and Sasaki (1994). On the other hand, information ge- 
ometers, such as Amari and Nagaoka (2000), adopt the notation R(0;,0;)0, = >>, RO, 


: igk 
with Rijet = 0, Rei, Gmi- 
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ture tensor is (cf. footnote 4) 


When the connection is torsion free, Rigi; is antisymmetric when i + j or 
when k «> 1, and symmetric when (i,j) < (I,k). It is related to the Ricci 
tensor Ric (to be defined in (13.27) below) via Rick; = 0; Rieijg". 


13.2.3 Dualistic Structure on a Manifold: 
Compatibility Between Metric and Connection 


A fundamental theorem of Riemannian geometry states that given a metric, 
there is a unique connection (among the class of torsion-free connections) 
that “preserves” the metric; that is, the following condition is satisfied: 


89(8%;,0;) = g(Va, Oi, 0;) + 9(;, Va,0;)- (13.15) 


Such a connection, denoted as Vv, is known as the Levi-Civita connection. Its 
component forms, called Christoffel symbols, are determined by the compo- 
nents of the metric tensor as (“Christoffel symbols of the second kind” ) 


7 Agi Og Ogi; 
PRA WT [ C9 1 C9 O9ig 
“ a, 2 ($4 + Sat — Ba! 


and (“Christoffel symbols of the first kind” ) 


F. _— 1 (99: | O9j% — AM; 
DK 9 \ Oxi ' Oat Oak ) 


The Levi-Civita connection is compatible with the metric, in the sense that 
it treats tangent vectors of the shortest curves on a manifold as being parallel 
(or equivalently, auto-parallel curves are also geodesics). 

It turns out that one can define a kind of “compatibility” relation more 
general than expressed by (13.13), by introducing the notion of “conjugacy” 
(denoted by «) between two connections. A connection V* is said to be “con- 
jugate” to V with respect to g if 


Ong (i, Oj) = g(Va, i, O;) + 9(0i, Va, 99). (13.14) 


Clearly, (V*)* = V. Moreover, V, which satisfies (13.13), is special in the 
sense that it is self-conjugate (V)* = V. 
Because metric tensor g provides a one-to-one mapping between points 


in the tangent space (i.e., vectors) and points in the cotangent space (i.e., 
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co-vectors), (13.14) can also be seen as characterizing how co-vector fields are 
to be parallel-transported in order to preserve their dual pairing (-,-) with 
vector fields. 

Writing out (13.14) explicitly, 


Ogi; 
Oxk 


where analogous to (13.10) and (13.12), 


V9,9; = > re O 
l 


= Trig + Tia (13.15) 


so that 
Tis = V5, On, = Da guts 


In the following, a manifold 9% with a metric g and a pair of conjugate 
connections I’, [* with respect to g is called a “Riemannian manifold with 
dualistic structure,” and denoted by {t, g, ’, l’*}. Obviously, F and I™ sat- 
isfy the relation (in either covariant or contravariant forms) 


a 


i= S(P+r"). 


More generally, in information geometry, a one-parameter family of affine 
connections I, called “a-connections” (a € R), is introduced (Amari, 1985, 
Amari and Nagaoka, 2000) 


l+a l-a 
ei 
2 - 2 


PO = 


ia (13.16) 


Obviously, 7) = r. 

It can be shown that the curvatures Rj,i;, Ritij for the pair of conjugate 
connections I”, I* satisfy 

Riki = Rigij- 

So, I’ is flat if and only if J™* is flat. In this case, the manifold is said to be 
“dually flat.” When I’, I* are dually flat, then I is called “a-transitively 
flat” (Uohashi, 2002). In such case, {Mt, g, 2, P(—™} is called an “a-Hes- 
sian manifold,” or a manifold with a-Hessian structure. 


13.2.4 Biorthogonal Coordinate Transformation 


Consider coordinate transform x +> u, 
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_. 0 aa! . 
2 ~ Ou, Sus Oa Oa” Ot, 


where the Jacobian matrix J is given by 


Jig(a) = aes Fi(u) = am 2 =, (13.17) 
J 


where 6! is the Kronecker delta (taking the value of 1 when i = 7 and 0 
otherwise). If the new coordinate system u = [ui,...,Un] (with components 
expressed by subscripts) is such that 


Jig (©) = Gig (2), (13.18) 


then the x-coordinate system and the u-coordinate system are said to be 
“biorthogonal” to each other because, from the definition of a metric tensor 
(13.9), 


(0,09) = (8 >» J) = S> I9g(8;,01) = YI gu = 6). 
1 1 
In such a case, denote 


g')(u) = g(0", 0°), (13.19) 


which equals J(u), the Jacobian of the inverse coordinate transform u + 2. 
Also introduce the (contravariant version) of the affine connection I under 
the u-coordinate system and denote it by an unconventional notation 7° 
defined by 


Varo? = S° Ty*6'; 
t 
similarly [°"* is defined via 
Varo = > Tyra. 
t 
The covariant version of the affine connections is denoted by superscripted 
I and I™: 
I*3* (uy) = g(Vai0’, O*), PF (y) = g(V5.0", 0"). (13.20) 


As in (13.11), the affine connections in u-coordinates (expressed in super- 
script) and in 2-coordinates (expressed in subscript) are related via 


a Ox" Ox8 OPxk \ dup 
Ep d\2 F Oui Ou; Buys) + du, dus | Oat ae) 
k 


446 J. Zhang, H. Matsuzoe 


and 


0= a Ox® Ox" O72" 


TS af 
= Ou; Ou; Our Fign(@) + Ou;OUs 


(13.22) 


Similar relations hold between I"*(u) and I*(x), and between I*"**(u) 
and I ;,(x). 
Analogous to (13.15), we have the following identity, 


O72" _ Og" (u) 
OusO0Ur Ous 


-_ rs" (u) + P**"(u), 


which leads to 


Proposition 13.1. Under biorthogonal coordinates, the component forms of 
the metric tensor satisfy 


dou U) gig (x = 6; 
while the pair of conjugate connections I’, I* satisfies 
r'sr(y) = — S° gi" (ug? (ug (u)Dig.a(a) (13.23) 
1,j,k 


and 


rey =~ ds (13.24) 


Next, we discuss the conditions under which biorthogonal coordinates exist 
on an arbitrary Riemannian manifold. From its definition (13.18), we can 
easily show that 


Lemma 13.2. A Riemannian manifold M with metric gij admits biorthog- 
onal coordinates if and only if Ogi; /Ox" is totally symmetric, 


Ogij(a) _ Ogix(@) 
Oxk Oxi — 


(13.25) 


That (13.25) is satisfied for biorthogonal coordinates is evident by virtue 
of (13.17) and (13.18). Conversely, given (13.25), there must be n functions 
u(x“), i= 1,2,...,n such that 


Ou; (x) 
“Oxi Gig (2) 951 (2) Oxi 


5 Note that (Og:;/Ox") = O;,(9(0;,0;)) # (Oxg)(0i,0;), the latter is necessarily totally 
symmetric whenever there exist a pair of torsion-free connections I’, I’* that are conjugate 
with respect to g. 
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The above identity, in turn, implies that there exists a function ® such that 
uj = O;® and, by positive definiteness of g;;, ® would have to be a strictly 
convex function! In this case, the z- and u-variables satisfy (13.7), and the 
pair of convex functions, & and its conjugate ®, is related to g;; and g’J by 


_ P(x) ij/.,  O7P(u) 
95@) = riggs 94) = Brau, 


It follows from Lemma 13.2 that a necessary and sufficient condition 
for a Riemannian manifold to admit biorthogonal coordinates is that its 
Levi-Civita connection is given by 


= = 1 (S24 Ogjk eo = 1 Ogi; 


Lyi 4(2) = ! = : 
aK@) = 5 \ Ger * Oat — Bak) — Dank 
From this, the following can be shown. 


Proposition 13.2. A Riemannian manifold {M,g} admits a pair of bior- 
thogonal coordinates x and u if and only if there exists a pair of conjugate 
connections y and y* such that 7i;,4(x) = 0, y*""(u) = 0. In other words, 
biorthogonal coordinates are affine coordinates for dually flat conjugate con- 
nections. 


In fact, we can now define a pair of torsion-free connections by 


Vij,b(2) = 0, Viz, (2) = Act 


and show that they are conjugate with respect to g; that is, they satisfy 
(13.14). This is to say that we select an affine connection y such that z is its 
affine coordinate. From (13.22), when -* is expressed in u-coordinates, 


yl Ogi Og'® 
eae | 2} 9" (a) v gig (x Ogis(t) g (u) 


"an Oxk Ou, 
i,j,k 


=a “ "2 Layla) + 


t 


Js ts 
— 


t Our 


This implies that wu is an affine coordinate system with respect to y*. There- 
fore, biorthogonal coordinates are affine coordinates for a pair of dually flat 
connections. Such a manifold {9t,g,y,7*} is called a “Hessian manifold” 
(Shima, 2007, Shima and Yagi, 1997). It is a special case of the a-Hessian 
manifold (introduced in Section 13.3.2). 
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13.2.5 Equiaffine Structure and Parallel Volume Form 
on a Manifold 


For a restrictive class of connections, called “equiaffine” connections, the 
manifold 9% may admit uniquely a parallel volume form w(). Here, a volume 
form is a skew-symmetric multilinear map from n linearly independent vectors 
to a nonzero scalar, and “parallel” is in the sense that (Ojw)(O1,...,0n) =0 
where 


(Ajw)(O1,..-, On) = (Vo,w)(O1,-.- On) 


3 


= 0;(w(O1,.--,On)) — w(...,Va,O1,---). 


l=1 


Applying (13.10), the equiaffine condition becomes 


8:(w(1,--.;On)) = Sow f 4 aOR Ge ) 


l=1 k=1 
= FG Oxy) SO Oige sia) > a 
l=1 k=1 l=1 
or 31 ) 
1 _— Ologw(x 
d Tate) Se (13.26) 


Whether a connection is equiaffine is related to the so-called Ricci tensor 
Ric, defined as the contraction of the curvature tensor R, 


Rici;(x) = S~ Rij; (2). (13.27) 
k 


For a torsion-free connection I’ i = I, applying the definition of the curva- 


ture tensor R to the above yields . 
ices Hinges pee (x) a So Th(2) (13.28) 
i ” Oat ; He Oxi \& - 
_ ye Ri, 
k 


One immediately sees that the existence of a function w satisfying (13.26) 
is equivalent to the right side of (13.28) being identically zero. In other 
words, the necessary and sufficient condition for a torsion-free connection 
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to be equiaffine is that its Ricci tensor is symmetric, Ricj; = Ric;;, or equiv- 
alently, >, Rg,; = 0. 

Making use of (13.26), it is easy to show that the parallel volume form of 
a Levi-Civita connection I’ is given by 


O(a) = /det[gij(x)] —> @(u) = y/det[g4 (u)]. 


The parallel volume forms w,w* associated with and I™ satisfy (apart from 
a positive, multiplicative constant) 


w(x) w*(x) = (O(2))” = det[gi;(x)], (13.29) 
w*(u) = (G(u))? = det[g? (u)]. (13.30) 
Let us now consider the parallel volume forms under biorthogonal coordi- 


nates. Contracting the indices t with r in (13.24), and invoking (13.26), we 
obtain 


O log w*(u) Ox) Ologw(x) _ Alogw*(u) | Alogw(x(u)) _ 
dus >> Bu, Oot ~ Ou, ° Ot 
After integration, 
w*(u) w(x) = const. (13.31) 
From (13.29)—(13.31), 
w(u) w* (x) = const. (13.32) 


The relations (13.31) and (13.32) indicate that the volume forms of the 
pair of conjugate connections, when expressed in biorthogonal coordinates 
respectively, are inversely proportional to each other. Note that w(x) = 
w(Oy,...,0n) and w*(a%) = w*(O1,...,0,), as skew-symmetric multilinear 
maps, transform to w(u) = w(0?,...,0") and w*(u) =w*(d!,...,0”) via 


w(x) = det[Ji;(x)]w(u) > w* (x) = det J‘ (u)]w* (uw), 


where det[J;(x)] = det[gis(x)] = (det[J4#(u)])-! = (det{g’ (u)])-1. 

When the pair of equiaffine connections I’, /* are further assumed to be 
dually flat, then the entire family of a-connections I’) given by (13.16) are 
equiaffine (Takeuchi and Amari, 2005, Matsuzoe et al., 2006, Zhang, 2007). 
The I’“)-parallel volume element w‘ can be shown to be given by 


wl) = eyl$0)/2 yyry(-a)/2, 


Clearly, 


w'%) (x)w'-) (x) = det[g:;(x)]| <> w'%) (u)w'—-™ (uw) = det[g’? (u)]. 
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13.2.6 Affine Hypersurface Immersion (of 
Co-Dimension One) 


We next discuss dualistic geometry from convex functions as related to hy- 
persurfaces in affine space, which is the subject of study in affine differential 
geometry (Simon et al., 1991, Nomizu and Sasaki, 1994). 

Let At! be the standard affine space of dimension n + 1, and 9M an 
n-dimensional manifold immersed into A”*! as a hypersurface with affine 
coordinates f = [f',...,f"t+]; that is, f: 9t — A”t!. Assume that the 
local coordinate gyatem on Mis x = [c!, gras th Wiehe = 16" ace] be 
a vector field defined on 9N that is “transversal,” that is, nowhere tangential 
to I. Denote the vector space associated with A"t! as V, with dim(V) = 
nm +1, and the canonical pairing of V with its dual vector space V (with 
dim(V) =n+1) as (, )n41; see (13.3). The duplet {f, €} is called an “affine 
immersion.” In local coordinates, they can be explicitly written as functions 
of x: {f(x),€(x)}, where f is valued in A and € is valued in V. 

Because the tangent space T,,(0t) is spanned by 


ofe ofe _ 
[Soe GE] eaten tah, 


we may decompose the second derivatives of f as 


O? fe of¢ ji — 
dxtan) yn e se thf (iG = 1-0), (13.33) 
where hi; = hj, (called “induced bilinear form” or “affine fundamental 


form”); if f is convex, then h;; is positive definite. The set of coefficients 
rf is called the “induced connection” on t, because it is induced by a flat 
connection on A”*+?!. Under coordinate transform, these coefficients can be 
shown to transform according to (13.21). Similarly, decompose the derivative 


of €* as 


a > gro + 74€%, (13.34) 


where S* is known as the “affine shape operator,” and 7; is a 1-form on IM 
called the “transversal connection form;” when 7 = 0 everywhere on Vt, the 
affine immersion {f,€} is called “equiaffine.” 

We define a volume form w on 3M arising out of the immersion of {f, &}, 


w(O1,.--,On) = Det(O,f,...,Onf, ), 


where Det is the determinant form on A”*+!, and 0,f is the vector field 
Of = [O:f',...,0f"t"]. The covariant derivative of w is given as follows 
(see Nomizu and Sasaki, 1994): 
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(Va,w)(O1, ania. t SOjy) = Tiw(O1, seey On). 


This implies that the induced volume form w is parallel with respect to the 
induced connection V if and only if {f,€} is equiaffine: r = 0. 

In order to consider the geometry induced from convex functions and bi- 
orthogonal coordinates, we consider a special kind of affine immersion called 
“raph immersion:” 


f=." 2G) €=(0,...,0, 1], (13.35) 


where ® is some nondegenerate (in particular, convex) function. Applying 
(13.33), we obtain the induced connection I'S (a) = 0 and the affine funda- 
mental form h,;(z) as the Hessian of ®, 


0°&(x) 
hig(@) = Bape 
Thus the geometry of a graph affine immersion coincides with the Hessian 
geometry induced from a convex function. Because the transversal vector field 
€ is parallel along f, from (13.34), obviously It has an equiaffine structure. 
We can define the “dual” of graph immersion, { f, € gh, mapping IN to A”*! 
as another graph. Here f = [ui,...,Un,@(u)], with @ and u given by (13.6) 


and (13.7), respectively. The transversal vector field € = (0,...,0, 1) is valued 
in V, the dual vector space. The affine fundamental form hi is 

a. 20(u) 

hY (u) = ———. 


Because of the identity 


2P(u =~ Ox* Ox! 0?6(x) 
aie a ui Ou; Oxk Aa!’ 


such affine fundamental form transforms as a 0-2 tensor 


(even though second derivatives in general do not transform in a tensorlike 
fashion). This means that for dual graph immersions {f,€} and iF, Et, the 
induced affine fundamental form is one and the same h = h. The induced 
objects {Mt,h, F, *} form a Hessian structure (i.e., induced connections are 
dually flat). 

More generally, for an arbitrary affine immersion, we can introduce the 
notion of “co-normal mapping,” defined as ¢: It — V as 
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(E(t), 0(2)) net = 1, (13.36) 
that is, 
n+l nt+l1 a 
Le@els=1 SF eayetey=0  @=1,....0). 


Intuitively, the co-normal map is a uniquely defined “normal” vector of the 
tangent hyperplane at f(x). (This property comes from (13.37).) The co- 
normal map is not a unit vector; the “length” of the map is normalized by 
(13.36). Note that the word “length” and “normal” are in quotation marks 
because no metric has ever been introduced on V or V: normalization is 
through the pairing operation (-,-). 

When {f,€} is equiaffine, then the co-normal map ¢ can be viewed as an 
immersion from IN to A"*+! (Nomizu and Sasaki, 1994, p. 57). Specifically, 
¢ (Mt) is taken to be (the negative of) the positional vector field (with respect 
to a center point) in addition to being the transversal vector field. In this case 
{ f,¢} = {—C, C} is an affine immersion, called the “co-normal immersion” of 
{f,€}. We also call {—¢, ¢} a “centroaffine immersion” because the immersion 
has a center, with the position vector —¢ (the first element in the duplet) 
transversal to its image Jt. We denote by Fr. h, 7, S ,--. the induced objects 
of {—¢,¢}. Then we have the following formulae (see Simon et al., 1991); 


Deje = —Teig + Onhag, (13.38) 
t= 5. tes (13.39) 
k=1 
7; = 0, 
ao, 


Equation (13.38) implies that V and V are mutually conjugate with respect 
to h. Note that Cand I’ are, respectively, the induced connections when It 
is immersed into A”*? in two distinct ways, {f,€} and {—C, ¢}. 

Suppose that {f,€} is a graph affine immersion with respect to some con- 
vex function, and {—¢, ¢} is the co-normal immersion of { f,€}. From (13.34), 
the affine shape operator S of {f,€} vanishes. This implies that h =0 from 
(13.39). Thus, although the co-normal map of an equiaffine immersion is a 
centroaffine hypersurface in A”+!, the co-normal map of graph immersion 
has its image lie on an affine hyperplane in A"t?. 

For an affine immersion {f,¢€} and the co-normal immersion {—¢,¢}, we 
define the “geometric divergence” G on any two points on It by 
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n+1 


G(x, y) = (F(z) — fy), Cy) nea = DO (F*(@) — F*(y)) Gal). 


a=1 


For a graph immersion given by (13.35), we can explicitly solve for ¢ from 
(13.36) and (13.37): 
¢ = [-O\4,...,—-O,®, 1]. 


Therefore, the expression for geometric divergence becomes 


G(a,y) = —(« — y, (O®)(y))n + P(x) — Ply) = Ba(a,y); 


geometric divergence is nothing but Bregman divergence (13.2), see Kurose 
(1994) and Matsuzoe (1998). 


13.2.7 Centroaffine Immersion of Co-Dimension Two 


Now we consider affine immersion of IN (with dim(Nt) = n) into a co-dimen- 
sion two affine space A"*? (rather than the co-dimension one affine space 
A”*! as discussed in the last section). In this case, in addition to specifying 
the immersion, denoted by f: It — A”*?, we need to specify two noncollinear 
vector fields, both “transversal” on IN. The vector space is denoted as V with 
dim(V) = n+ 2; the dual vector space is denoted as V with dim(V) = n+2. 
To simply the situation, we consider centroaffine immersion such that one of 
the transversal vector fields is the (negative of the) positional vector —f and 
the other is, as before, denoted €, that is, the affine immersion is denoted as 
{f,—f,€}; the elements are valued in A"*?, V, V, respectively. The second 
derivatives of f and € are decomposed as follows (for 1,7 = 1,...,n; a = 
1,...,n +2); 


Q? fe iG Ofe 
FP hE + nage? — taf, 


Ox'Oxi H+ J Oak 


Og? _ “ , Of? ca fa 
a 2s ak + 7)€* — Ki f*. 


As in affine immersion of co-dimension one, we call I ‘ the “induced con- 
nection,” Ay; the “affine fundamental form,” 7; the “transversal connection 
form,” and S* the “affine shape operator.” Below, we assume that h is posi- 
tive definite (i.e., f is strictly convex) and 7 = 0 (the centroaffine immersion 
is equiaffine). 

We denote the “dual map” of {f,—f,€} as another centroaffine map tak- 
ing the form of 7. =f, ¢}, where the elements are valued in A”*?, Vz V~. 
respectively; f and ¢ are specified by 
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(F(@),€(@) nia = 1, (C(#),€(@) nae = 0, 
) 1 


n+2 n+2 
>> fa(a)e*(e) = 1, > Sa(x)é4(a) = 0, 
a ae 
‘> fa(x)f?(a) =0, S> Ca(x) f° (x) = 1, (13.40) 
a= a=1 
wie i (x) = 0, 5 Ca(0) Le (e) =0 (é=1,...,n). (13.41) 
= Ox" mr Ox’ 

Denote the induced objects as ie h, T, ...; we have the following formulae 


(see Nomizu and Sasaki, 1994, Matsuzoe, 1998); 


Oxhig = Deig + Dagar 
hij = hij, (13.42) 


T, = 0. 


We remark that (13.42) is different from (13.39) of the co-dimension one case. 
If a centroaffine immersion { f, —f,€} induces {g, ’} on Mt, then the dual map 
{f,—f,C} induces {g, } on IM. This implies that the theory of centroaffine 
immersions of co-dimension two is more useful than that of affine immersions 
of co-dimension one when we discuss the duality of statistical manifold. 
Consider the special case of graph immersion (of co-dimension two) 


ee =f; ae that is, 
f = le ange"), 1), = (0.42.90, 1,0), (13.43) 


where @(x) is some convex function. If {f,—f,€} has other representations, 
they are centroaffinely congruent (linearly congruent) to (13.43); hence it 
suffices to consider (13.43). = 

From straightforward calculations, the dual map {f,—f,¢} of {f,—f,€} 
takes the form 


f =([-u,...,—-un,1,8(u)},  ¢=[0,...,0,0,]]. (13.44) 


The left side equation in (13.40) then gives 


~S>x'uj + G(x) + G(u) = 0, 
w=1 


13 Dualistic Geometry from Convex Functions 455 
and the left side equation in (13.41) is 


O® 


Thus, @ is the convex conjugate of @ as in (13.6), and u = [u1,..., Uy] is the 
conjugate variable as in (13.7). For graph immersion, it is easy to check that 
Lyi,3 = 0, oP 0, tij = 0, 7 = 0, Kh; = 0 for all indices and 


2 
hij(x) = ead 

x Ord 
The same is true for induced objects in dual immersion. 

Just as in the case of equiaffine immersion {f,¢} of co-dimension one 
and the associated co-normal map {—¢,¢}, we can construct the geometric 
divergence G on IN for centroaffine immersion {f, —f,€} of co-dimension two 
and the associated dual map {f,—f,¢}: 


G(x, y) = (f(y), F(®) — FY) n42 


For graph immersion, we substitute f and f in (13.43) and (13.44) to yield 


G(x, y) = —(x, (0D)(y))n + (x) + B((G)(y)) 
= Bo(z,y). 


In both the equiaffine immersion of co-dimension one (discussed in Sec- 
tion 13.2.6) and centroaffine immersion of co-dimension two (discussed here), 
the notion of geometric divergence is a generalization of the Bregman (canon- 
ical) divergence on a dually flat space. 


Proposition 13.3. (Kurose, 1994, Matsuzoe, 1998) Let & be a strictly 
convex function on R". Then geometric divergence G(x,y): V x V > R in- 
duced by the affine immersion of ® as a graph in A"*" or by the centroaffine 
immersion of ® as a graph in A"*? equals the Bregman divergence Ba(x, y). 


13.3 The a-Hessian Structure Associated with 
Convex-Induced Divergence 


The discussion at the end of the last section anticipate a close relation between 
convex functions and the Riemannian structure on a differentiable manifold 
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whose coordinates are the variables of the convex functions. On such a mani- 
fold, divergence functions take the role of pseudo-distance functions that are 
nonnegative but need not be symmetric. That dualistic Riemannian manifold 
structures can be induced from a divergence function was first demonstrated 
by S. Eguchi. 


Lemma 13.3. (Eguchi, 1983, 1992) A divergence function induces a Rie- 
mannian metric g and a pair of conjugate connections I’, I™* given as 


9: (#) = —AyiDysD(@,Y)|,_ 3 (13.45) 
Tij,6 (2) = —Ozi Ons Oye D(x, y)| yg 3 (13.46) 
Ti5,n(t) = —Oyi dys O_eD(a,y)| 2 - (13.47) 


It is easily verifiable that 9;;,I7ij,4,175, a8 given above satisfy (13.15). Fur- 
thermore, under arbitrary coordinate transform, these quantities behave 
properly as desired. Equations (13.45)—(13.47) link a divergence function D 
to the dualistic Riemannian structure {M,g,[,I*}. 

Applying Lemma 13.3 to Bregman divergence Bo(x,y) given by (13.2) 
yields 


D(x) 
9i(2) = aaa 
and 
. D(a) 
Tign(t)=0, Tipe (®) = sae ae 


Calculating their curvature tensors shows the pair of connections are dually 
flat. It is commonly referred to, in affine geometry literature, as the “Hessian 
manifold” (see Section 13.2.4), although in the study by Shima (2007), the po- 
tential function @ need not be convex but only semidefinite. In u-coordinates, 
these geometric quantities can be expressed as 


; PP(u 
a ae 
iOU; 


‘ FP(u) 
*19,k — ig,k = 
rita) =0, OK) = Fe, 


where @ is the convex conjugate of &. Below, this link from convex functions 
to Riemannian manifold is explored in greater detail. 


13.3.1 The a-Hessian Geometry 


We start by reviewing a main result from Zhang (2004) linking the divergence 


function DY? (x,y) defined in (13.5) and the a-Hessian structure. 
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Proposition 13.4. (Zhang, 2004) The manifold {M, Fe aww © ma as- 
sociated with Do (x,y) is given by 


ij (@) = Bij (13.48) 
and . io 
a —-@ * (a a 
PY R(a) a) Pijk, ce) = 9 Paik: (13.49) 
Here, ®;;, Bij, denote, respectively, second and third partial derivatives of 
P(x) 
@,. _ 9(x) @..- P(x) 
9 Oa*Oxi" I ~ Oxidaioak 


Recall that an a-Hessian manifold is equipped with an a-independent 
metric and a family of a-transitively flat connections 1 (i.e., [( satisfying 
(13.16) and P’(+ are dually flat). From (13.49), 


cy = po®) 


a aj,k ? 
with the Levi-Civita connection given as: 


a I! 
Cee) = 3 isk: 


Straightforward calculation shows that: 


Corollary 13.1. For a-Hessian manifold {M, gx, Fae Te, 
(i) The curvature tensor of the a-connection is given by 


R® (a) = 1- a? So Pj — Pity P je) O* = Re) (a) 
ai = ilu jk ilu jkv = Aijuv ©), 


Lk 


with YW) being the matrix inverse of ®i;, 
(ii) All a-connections are equiaffine, with the a-parallel volume forms (i.e., 
the volume forms that are parallel under a-connections) given by 


w(x) = det[®,;(x)|0- ©”. 


The reader is reminded that the metric and conjugated connections in the 
forms (13.48) and (13.49) are induced from (13.5). Using the convex conjugate 
®:V >R given by (13.6), we introduce the following family of divergence 
functions DO) (a, y): Vx V +R, defined by 


6 The subscript in x (or u below) indicates that the x-coordinate system (or u-coordinate 
system, resp.) is being used. Recall from Section 13.2.4 that under x (u, resp.) local coor- 
dinates g and I’, in component forms, are expressed by lower (upper, resp.) indices. 
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DO (x,y) = DO ((4)(x), (0H) (y)). 
Explicitly written, this new family of divergence functions is 


1+ % §(a6y)) 


BE e.0) = ap (GA Home + 


lt+a 


- 6 (* a0(e) +4 a0). 


Straightforward calculation shows that Dy (x,y) induces the a-Hessian 
structure {I, gx, pr, pen where (F™ are given by (13.49); that is, the 
pair of a-connections are themselves “conjugate” (in the sense of a @ —a) 
to those induced by DO) (x, y). 


13.3.2 Biorthogonal Coordinates on a-Hessian 


Manifold 
If, instead of choosing x = [x!,...,a”] as the local coordinates for the mani- 
fold IM, we use its biorthogonal counterpart u = [u1,...,Un] to index points 


on 9. Under this u-coordinate system, the divergence function De) between 
the same two points on Jt becomes 


DY (u,v) = DY? ((08)(u), (8B)(v)). 


Explicitly written, 


BY (we) = A (22100) + 


6 (- 5 (88) "(u) + : 5 (00)"())) . 


Recalling our notation (13.19) and (13.20), we have 


Corollary 13.2. The a-Hessian manifold (Ot, 94,7" yy associated 
with DO (u,v) is given by 


g')(u) = 64 (u), (13.50) 


l+a 


ins facies ay a 
P(t (y) = Disk P*(dis-k (4) = Qa 


2 


ij. (13.51) 


Here, f'3, DIF denote, respectively, second and third partial derivatives of 
Plu), 
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Table 13.1 Divergence functions and induced geometry 


D™ (u,v) 7 (eee a 
D&) ((8B)(u), (88)(v)) |V x V 4 Ry gare saree aia 


ees 2B(y jab 3B(u 
&)(u) = 0 P( ) IF (y) _ 0 D( ) 


~ OujOu;’ 7 OujOujOup 
We remark that the same metric (13.50) and the same a-connections 
(13.51) are induced by Day, v= DO) (w, u); this follows as a simple ap- 
plication of Lemma 13.3. 


An application of (13.23) gives rise to the following relations. 


penal (a) a S- gi™(u)g” (u)g" (ur (2), 


1,j,k 
peoymnl(y) = — S% g™(u)g™ (uo WELLE), 
1,j,k 
ROR yy = S~ gi*(u)g? (u)g’™ (u)g’” (uy) RO, (a). 
UJ [LV 


The volume form associated with ['@ is 
w (u) = det[B4 (u)JU+~/2. 
When a = +1, D&) (u,v), as well as DO (a, y) introduced earlier, take 


the form of Bregman divergence (13.2). In this case, the manifold is dually 
flat, with curvature tensor RE) (2) = R(EVAlmn(y) = 0, 


Vv 
We summarize the relations between the convex-induced divergence func- 


tions and the geometry they generate in Table 13.1. 


13.3.8 Applications of a-Hessian Geometry 


Finally, we give an application of the a-Hessian geometry in mathematical 
statistics. A statistical model is a set of (what we call) ¢-functions ¢€ + p(¢), 
where a ¢-function is an element of some function space B = {p(-): ¥ = 
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R, p(¢) > O} over a o-finite set ¥ with dominant measure yw. A parametric 
model 9g is defined as 


Me = {p(-|0) € BO eV CR"}. 


That is, 2» forms a smooth manifold with 6 as coordinates. 

One can define divergence functionals to measure the directed distance 
between two ¢-functions p and q. The most familiar is the Kullback—Leibler 
divergence. With the aid of a smooth and strictly convex function f: R— R 
and a strictly increasing function p: R — R, one can show that the following 
is a general form of convex-induced divergence functional. 


a [ Hom) +2100) -F (Fo) + +" Ha) fig 
x 


(13.52) 
since it is nonnegative and equals zero if and only if p(¢) = q(¢) almost surely. 
A parametric model p(-|0) € Ig is said to be “p-affine” if there exists a set 
of linearly independent functions A;(¢) € B such that 


(p(¢|0)) = 


The parameter 6 = [6',...,0"] is called the “natural parameter” of a p- 
affine parametric model, and the functions A;(¢),...,An(¢) are the affine 
basis functions. Examples of p-affine manifold include the so-called “alpha- 
affine manifolds” (Amari, 1985, Amari and Nagaoka, 2000), where p(-) takes 
on the following form (indexed by £ € [-1, 1]), 


log t B=1, 


(9) (¢) = 
2 40-O/2 Be [-1,1). 


When a parametric model is p-affine, the function 


= | socio) an= f (Sea) d 


can be shown to be strictly convex. Therefore, a hice functional in 
(13.52) takes the form of the divergence function pe On, 6,) on V x V given 
by 


4 l-a l+a 
(*90,) + Z* 90.) 
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This is exactly (13.5)! An immediate consequence is that a p-affine mani- 
fold is the a-Hessian manifold, with metric and affine connections given by 
Proposition 13.4. 

For any ¢-function ¢ + p(¢), we now define 


_ i f(e(o(O)) (Od 
xX 


such that 7 = [m1,---,%] € V CR”. We call 7 the “expectation parameter” 
of p(¢) with respect to the set of (affine basis) functions A1(¢),...,An(¢). It 
can be easily verified that for the p-affine parametric models, 


_ 80(8) 
7 BOF” 


Define 
5*(6) = I F(F'(o(p(C1)))) a, 


where f: R > R is the Fenchel conjugate of f; then (7) = 6*((06)—"(n)) 
is the Fenchel conjugate of (0). The pair of convex functions &, * induces 
n, @ via: 7 
@ 
a0), 28(n) _ 
oe" Oni 


In theoretical statistics, we can call &(@) the generalized cumulant gen- 


erating function (or partition function), and (7) the generalized entropy 
function. Natural parameter 0 and expectation parameter 7, which form bi- 
orthogonal coordinates, play important roles in statistical inference. 


13.4 Summary and Open Problems 


For two smooth, strictly convex functions @, @ that are mutually conjugate, 
the variables u = 0@(x) and x = 0®(u) are in one-to-one correspondence. It 
has been shown in this chapter that such a pair of variables can be viewed 
as biorthogonal coordinate systems on a Riemannian manifold whose metric 
is the second derivative of ® when the x-coordinate system is used (or of 
® when the u-coordinate system is used). Furthermore, a family of affine 
connections (indexed by a) can be defined with nonzero curvatures except 
for a = +1, the dually flat case (the so-called “Hessian manifold”). Each 
of these a-connections is equiaffine and admits a parallel volume form, and 


the entire family is induced from the divergence function De) (or De) 
associated with any convex function & (or ®). 
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Our analysis revealed that the conjugate (+a)-connections reflect two 
kinds of duality embodied by the convex-induced divergence function. The 
first is referential duality related to the choice of the reference and the com- 
parison status for the two points (a versus y) for computing the value of the 


divergence DS) (x,y) = pr (y,x). The second is representational duality 


related to the construction of two families of divergence functions, DE) (a, y) 
versus D\ ((A8)(c), (O@)(y)), using conjugate convex functions (see Table 
13.1 in Section 13.3). The geometric quantities expressed in x-coordinates 
and expressed in u-coordinates are related to each other via Proposition 13.1. 
When a = +1, the two members of divergence functions coincide (and be- 
come Bregman divergence), so that the two kinds of duality reveal themselves 
as biduality: 


DS» (2,y) = DO” (AG(y), AB(a)) = DO (3@(a), A®(y)) = DE (y, x) , 
which is compactly written in the form of canonical divergence as 
Ao(x,v) = Ag(v, 2) . 


The relation between convex-induced divergence functions and a-connec- 
tions is intriguing; that a as a convex mixture parameter coincides with a as 
indexing the family of connections is remarkable! We know that, in general, 
there may be many families of divergence functions that could yield the same 
a-connections. An explicit construction is as follows. Take the families of 
divergence functions (y € R, 6 € [-1,1]) 


AFP DO (a,y) + SPoe(a,y), 
which induce an a-Hessian structure whose metric and conjugate connections 
are given in the forms (13.48) and (13.49), with a taking the value of (vy. 
The nonuniqueness of divergence functions giving rise to the family of a- 
connections invites the question of how to characterize the convex-induced 
divergence functions from the perspective of a-Hessian geometry. There is 
reason to believe that such axiomatization is possible because (i) the form of 
divergence function for the dually flat manifold (a = +1) is unique, namely, 
the Bregman divergence Bg; (ii) Lemma 13.1 gives that D(®) > 0 if and only 
if Bs > 0 for any smooth function &. This hints at a deeper connection yet to 
be understood between convexity of a function and the a-Hessian geometry. 
Another topic that needs further investigation is with respect to affine 
hypersurface realization of the a-Hessian manifold. We know that in affine 
immersion, geometric divergence is a generalization of the canonical diver- 
gence of dually flat (i-e., Hessian) manifolds. How to model the nonflat mani- 
fold with a general a value remains an open question. In particular, is there a 
generalization of geometric divergence that mirrors the way a convex-induced 
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divergence function De) generalizes Bregman divergence Bg (or equivalently, 
the canonical divergence Ag)? 

Finally, how do we extend the above analysis to an infinite-dimensional 
setting? The use of convex analysis (in particular, Young function and Orlicz 
space) to model the infinite-dimensional probability manifold yields fruitful 
insights for understanding difficult topological issues (Pistone and Sempi, 
1995). It would thus be a worthwhile effort to extend the notion of bior- 
thogonal coordinates to the infinite-dimensional manifold to study nonpara- 
metric information geometry. To this end, it would also be useful to extend 
the affine hypersurface theory to the infinite-dimensional setting and pro- 
vide the formulation for co-dimension one affine immersion and co-dimension 
two centroaffine immersion. Here, affine hypersurfaces are submanifolds (re- 
sulting from normalization and positivity constraints on probability density 
functions; see, e.g., Zhang and Hasto, 2006) of an ambient manifold of unre- 
stricted Banach space functions. Preliminary analyses (Zhang, 2006b) show 
that such an ambient manifold is flat for all a-connections, a € R. So it 
provides a natural setting (i.e., affine space) in which probability densities 
can be embedded as an affine hypersurface. The value of such a viewpoint 
for statistical inference remains a topic for future exploration. 
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Chapter 14 
NMR Quantum Computing 


Zhigang Zhang, Goong Chen, Zijian Diao, and Philip R. Hemmer 


Summary. Quantum computing is at the forefront of scientific and techno- 
logical research and development of the 21st century. NMR quantum comput- 
ing is one the most mature technologies for implementing quantum compu- 
tation. It utilizes the motion of spins of nuclei in custom-designed molecules 
manipulated by RF pulses. The motion is on a nano- or microscopic scale gov- 
erned by the Schrodinger equation in quantum mechanics. In this chapter, we 
explain the basic ideas and principles of NMR quantum computing, including 
basic atomic physics, NMR quantum gates, and operations. New progress in 
optically addressed solid-state NMR is expounded. Examples of Shor’s al- 
gorithm for factorization of composite integers and the quantum lattice-gas 
algorithm for the diffusion partial differential equation are also illustrated. 


14.1 Nuclear Magnetic Resonance 


Many chapters in this book are concerned with mathematical problems in 
mechanics, elasticity, fluid mechanics, materials, and so on, which are on the 
macroscale. At the other extreme is the study of problems in atoms and 
molecules, photonics, nanotechnology, and the like, which are of the micro- 
or nanoscale governed chiefly by the Schrédinger equation. This area has un- 
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dergone rapid advancement during the past ten years, in large part due to the 
stimuli from laser applications, quantum computing and quantum technol- 
ogy, and nanoelectronics. Most of the practitioners in this area are physicists 
and it appears that this area has not drawn enough attention from mathe- 
maticians. Here, we wish to describe one such development, namely, nuclear 
magnetic resonance (NMR) quantum computing. There already exist many 
papers on this topic (see, e.g., [18, 19, 16, 56, 58, 109]) written by physicists 
and computer scientists. Our chapter here describes the same interest, but 
perhaps from a more mathematical point of view. 


14.1.1 Introduction 


As of today, NMR is the most mature technology for the implementa- 
tion of quantum computing. Naturally, this area is rife with papers. A 
good Internet resource for looking up NMR quantum computing references, 
both old and new, is the U.S. Los Alamos National Laboratory’s Web site 
http://xxx.lanl.gov/quant-ph. 

At present, several types of elementary quantum computing devices have 
been developed, based on AMO (atomic, molecular, and optical) or semi- 
conductor physics and technologies. We may roughly classify them into the 
following. 


Atomic — ion and atom traps, cavity QED [13]. 

Molecular — NMR. 

Semiconductor — coupled quantum dots [12], silicon (Kane) [59]. 
Crystal structure — nitrogen-vacancy (NV) diamond. 
Superconductivity — SQUID. 


The above classification is not totally rigorous as new types of devices, such 
as quantum dots, or ion traps embedded in cavity-QED, have emerged which 
are of a hybrid nature. Also, laser pulse control, which is of an optical na- 
ture, seems to be omnipresent. In [3], a total of 12 types of quantum com- 
puting proposals have been listed.' Nevertheless, it is clear that NMR quan- 
tum computing belongs to the class of molecular computing where we use 
molecules as a small computer. The logic bits are the nuclear spins of atoms in 
custom-designed molecules. Spin flips are achieved through the application of 
radio-frequency (RF) fields on resonance at the nuclear spin frequencies. The 
system can be initialized by cooling the system down to the ground state or 
known low-entropy state, or using a special technology called averaging, espe- 
cially for liquid NMR working in room temperature. Measurement or readout 
is carried out by measuring the magnetic induction signal generated by the 


1 The additional proposals not listed above but given in [3] are quantum Hall qubits, 
electrons in liquid helium, and spin spectroscopies. 
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precessing spin on the receiver coil. Numerous experiments have been success- 
fully tried for different algorithms, mostly using liquid NMR technology. The 
algorithms tested include Grover’s search algorithm [108, 58, 46, 122], other 
generalized search algorithms [76], quantum Fourier transforms [26, 114], 
Shor’s algorithm [111], Deutsch—Jozsa algorithm [15, 74, 27, 84, 24], order 
finding [107, 100], error correcting code [65], and dense coding [33]. There 
are also other implementations reported such as the cat-code benchmark [64], 
information teleportation [87], and quantum system simulation [98]. 

NMR is an important tool in chemistry which has been in use for the 
determination of molecular structure and composition of solids, liquid, and 
gases since the mid-1940s, by research groups in Stanford and MIT indepen- 
dently, led by F. Bloch and E.M. Purcell, both of whom shared the Nobel 
prize in physics in 1952 for the discovery. 

There are many excellent monographs on NMR [31, 91, 82]. There are 
also many other nice Internet Web-site resources offering concise but highly 
useful information about NMR (cf., e.g., [28, 51, 115]). Let us briefly explain 
the physics of NMR by following Edwards [28]. The NMR phenomenon is 
based on the fact that the spin of nuclei of atoms have magnetic properties 
that can be utilized to yield chemical, physical, and biological information. 
Through the famous Stern—Gerlach experiment in the earlier development of 
quantum mechanics, it is known that subatomic particles (protons, neutrons, 
and electrons) have spins. Nuclei with spins behave as a bar magnet in a 
magnetic field. In some atoms, for example, !7C (carbon-12), !®O (oxygen- 
16), and **S (sulphur-32), these spins are paired and cancel each other out 
so that the nucleus of the atom has no overall spin. However, in many atoms 
(1H, 18C, 3!P, 1°N, !8F, etc.) the nucleus does possess an overall spin. To 
determine the spin of a given nucleus one can use the following rules. 


1. If the number of neutrons and the number of protons are both even, the 
nucleus has no spin. 

2. If the number of neutrons plus the number of protons is odd, then the 
nucleus has a half-integer spin (i.e., 1/2, 3/2, 5/2). 

3. If the number of neutrons and the number of protons are both odd, then 
the nucleus has an integer spin (i.e., 1, 2, 3). 


In quantum mechanical terms, the nuclear magnetic moment of a nucleus 
can align with an externally applied magnetic field of strength Bo in only 
2I+1 ways, either with or against the applied field Bo, where J is the nuclear 
spin given in (1), (2), and (3) above. For example, for a single nucleus with 
I = 1/2, only one transition is possible between the two energy levels. The 
energetically preferred orientation has the magnetic moment aligned parallel 
with the applied field (spin m = +1/2) and is often denoted as a, whereas 
the higher-energy anti-parallel orientation (spin m = —1/2) is denoted as 
B. See Figure 14.1. In NMR quantum computing, these spin-up and spin- 
down quantum states resemble the two binary states 0 and 1 in a classical 
computer. Such a nuclear spin can serve as a quantum bit, or qubit. The 
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Fig. 14.1 Splitting of energy levels of a nucleus with spin quantum number 1/2. 
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Fig. 14.2 A magnetic field Bo is applied along the z-axis, causing the spinning nucleus 
to precess around the applied magnetic field. 


rotational axis of the spinning nucleus cannot be oriented exactly parallel 
(or anti-parallel) with the direction of the applied field Bo (aligned along the 
z-axis) but must precess (motion similar to a gyroscope) about this field at 
an angle, with an angular velocity wo, given by the expression wo = yBo. 
The precession rate wo is called the Larmor frequency (cf. Figure 14.2). See 
more discussion of wo below. The constant ¥ is called the magnetogyric ratio. 
This precession process generates a magnetic field with frequency wo. If we 
irradiate the sample with radio waves (MHz), then the proton can absorb the 
energy and be promoted to the higher-energy state. This absorption is called 
resonance because the frequencies of the applied radiation and the precession 
coincide at that frequency, leading to resonance. 

There is another technique related to NMR, called electron spin resonance 
(ESR), that deals with the spins of electrons instead of those of the nuclei. 
The principles for ESR are nevertheless similar. 

Quantum entanglement is accomplished through spin-spin coupling from 
the electronic bonds between the nuclei within the molecule and special RF 
pulse manipulations. 

We now examine some fundamentals of atomic physics that are essential 
in any quantitative study of the manipulation of the quantum behavior of 
atoms. A complete description of the Hamiltonian (i.e., energy) of an atom 
contains nine terms as follows [31]; 


H=Ha+Hor+ Ars +Hss + Aze+ Ayr + Agn+ Ai + Ha. (14.1) 
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Fig. 14.3 Schematic diagram of an NMR apparatus. A sample which has nonzero-spin 
nuclei is put in a static magnetic field regulated by the current through the magnet coil. 
A transmitter coil provides the perpendicular field and a receiver coil picks up the signal. 
We can change the current through the magnet coil or change the frequency of the current 
in the transmitter coil to reach resonance. 


The first three terms have the highest order, called the atomic Hamilto- 
nian. They are the electronic Hamiltonian term, crystal field term, and the 
spin-orbit interaction term, respectively. The electronic Hamiltonian consists 
of kinetic energy of all electrons, mv?/2 = p?/2m, and two Coulomb terms: 
the potential energy of electrons relative to the nuclei, —z,e?/r,;, and the 
interelectronic repulsion energy, e?/7;;: 


2 2 
Pi en € 
Hey a y 
—~2m <¢ 
a i,n 


where 7,,; denote the distance between the ith electron with the nth nu- 
cleus, and r;; denote the interelectronic distance between the ith and the jth 
electrons. 

The term Hor is called the crystal field term. It comes from the interaction 
between the electron and the electronically charged ions forming the crystal, 
and is essentially a type of electrical potential energy: 


pou 
ag 


Tig 


where Q, is the ionic charge and r;; is the distance from the electron to the 
ion. Normally, only those ions nearest to the electron are considered. 
The third in the atomic Hamiltonian is the interaction between spin and 
orbit: 
Arg =AL-S, 
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where L and S are the angular momenta of the orbit and spin, respectively, 
and A is the coupling constant. In this section, we use S for the electron spin 
and I for the nuclear spin. 

The remaining six terms are called spin Hamiltonians. Terms Hz. and 
Az, are two that result from the application of an external magnetic field: 


Hz. = 8B-(L+S), 


where B is the magnetic field strength. These two terms are called Zeeman 
terms, and they play major roles in NMR and ESR. 

The nuclear spin-spin interaction term H7; is also important in quantum 
computation: 

Alyy = Vl Ji Tj, 
i>j 

because it provides a mechanism for the interaction between qubits. Hyperfine 
interaction arises from the interaction between the nuclear magnetic moments 
and the electron: 


Hur =S8-)_ Ac-k. 


In (14.1), by letting the z-axis be the privileged direction of spin measure- 
ment, the spin-spin interaction term Hg s is expressed as 


1 
Hgsg = D[S? — 35(8 + 1)] + E(S2 - $2). 


The very last term in (14.1) is called the quadrupolar energy: 


e?Q = 


He= ae oH) (312 — 117 +1) + (2 - #)). 


OL? 


For a specific system, only the Hamiltonian playing major roles is needed 
in the final model. For example, in the study of ESR, only three terms are 
retained and the Hamiltonian is written as 


H= Az. + Hur + gs, 
whereas in the NMR case, 


A = Hz,+ Hy. 
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14.1.2 More about the Hamiltonian of NMR 


A classical way to explain NMR is to regard it as a rotating charged particle 
that acts as a current circulating in a loop ([31, 10]), which creates a magnet 
with magnetic moment 1, 4p = qur/2, where q is the electronic charge. The 
particle is rotating at v/2ar revolutions per second. 

Converting jz to electromagnetic units by dividing it by the velocity of 
light, and using angular momentum of the particle rather than the velocity 
of the particle, we obtain 

b= (q/2Mc)p, 


where p is the angular momentum oriented along the rotating axis. The ratio 
p/p is called the magnetogyric ratio, denoted by y. A static magnetic field 
with strength B will apply a torque, which is equal to uz x B, on this particle. 
Newton’s law states that the angular momentum will change according to a 
differential equation 


d 
lo xB 4 


= B 
ier 


Computation shows that p will rotate around the direction of B with 


frequency wo defined by 
q 


~ 2M gre 

The above is called the Larmor equation, and the frequency wo is called the 
Larmor frequency, the precession frequency, or the resonance frequency as 
mentioned previously in Figure 14.2. 

The above classical considerations are now modified by quantization to in- 
corporate the quantum-mechanical behaviors of the nuclear spin. The vector 
variable p is quantized with quantum number (I(I+1))!/?, and its projection 
to the z-axis (the direction of the magnetic field) is mf. In total, there are 
21 +1 valid values of m evenly distributed from —J to J; that is, m = —I, 
—I+1,...,2—1, J. A factor g is introduced to include both the spin and 
orbital motion in the total angular momentum, called the Landé or spectro- 
scopic splitting factor. For a free electron and proton, the magnetic momenta 


can be given as 
te = 2 he \ _ geB 
© 2 \ 4rMec oS 


he 
n= nd = Jn I ’ 
Mn = g (i) InI Bn 


Wo 


where ge = 2.0023, gn = 5.58490. Numbers 6 and (y are called, respectively, 
the Bohr and the nucleus magneton where 8 = 9.27 x 10-2! erg gauss~! and 
By = 5.09 x 10-24 erg gauss~!. These values vary for different particles. In 
NMR, it is convenient to use the resonance frequency wo: 
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hwo ao gePBo, 
hwo = gnI Bn Bo. 


Now we can write the Hamiltonian of a free nucleus as 
H=-p-B=—/fAy1-B, (14.2) 


where ¥ is the magnetogyric ratio defined by y = j4/ Ih just as in the classical 
case. It is a characteristic constant for every type of nuclei; different nuclei 
have different magnetogyric ratios. Vector I after quantization, becomes the 
operator of angular momentum. The eigenvalues of this system, or the energy 
levels are 

E=vyhmB, m=-I,-I+1,...,J-1,1f. (14.3) 


The difference between two neighboring energy levels is yhB, which defines 
the resonance frequency depending on the magnetic field B and the particle. 

There are other factors to be considered. The resonance frequency changes 
with the chemical environment of the nucleus. An example is the fluorine res- 
onance spectrum of perfluorioisopropy! iodide. Two resonance lines of fluorine 
are observed in the spectrum, and the intensities ratio 6:1 agrees with the 
population ratio of the two groups of fluorine atoms. This phenomenon, called 
the chemical shift, is proportional to the strength of the magnetic field ap- 
plied. This effect comes up because electrons close to the nucleus change the 
magnetic field around it; in other words, they create a diamagnetic shielding 
surrounding the nucleus. If the static field applied is Bo, then the electrons 
precessing around the magnetic field direction produce an induced magnetic 
field opposing Bo. The total effective magnetic field around the nucleus is 
then 

B=B)- B’ =(1-—0)Bo, 


where the parameter o is called the shielding coefficient. In some cases o is 
dependent on the temperature. 

High-resolution NMR spectroscopy has found that the chemical shifted 
peaks are also composed of several lines, a result of the spin-spin coupling, 
which is the second term in the NMR Hamiltonian: 


Hy = SOU; Jig G- 


w>j 


14.1.3 Organization of the Chapter 


Section 14.1 thus far has introduced some basic facts of nuclear spins and 
atomic physics. 

In Section 14.2, we give background on what quantum computing is about, 
and introduce universal quantum gates based on liquid NMR. 
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Section 14.3 describes the most recent progress in solid-state NMR quan- 
tum gate controls and designs. 

Sections 14.4 and 14.5 explain applications of the NMR quantum computer 
to Shor’s algorithm and a lattice-gas algorithm. 


14.2 Basic Technology Used in Quantum Computation 
with NMR 


14.2.1 Introduction to Quantum Computation 


Quantum mechanics is one of the revolutionary scientific discoveries of the 
20th century. The field of quantum computation, our emphasis in this chap- 
ter, was born when the principles of quantum mechanics were introduced to 
modern computer science. Quantum computation mainly studies the analy- 
sis and construction of quantum algorithms with an eye toward surpassing 
the classical counterparts. Another tightly connected field is quantum infor- 
mation, which deals more with the storage, compression, encryption, and 
communication of information by quantum mechanical means [40, 7]. Quan- 
tum teleportation [6, 11] and quantum cryptography [5, 29] are two of the 
most known subjects of this field. 

Modern computer science emerged when the eminent British mathemati- 
cian Alan Turing invented the concept of the Turing machine (TM) in 
1936 [103]. Although very simple and primitive, the TM captures the essence 
of computation. It serves as the universal model for all known physical com- 
putation devices. For many years, quantum effects had never been consid- 
ered in the theory of computation, until the early 1980s. Benioff [4] first 
coined the term of quantum Turing machine (QTM). Motivated by the prob- 
lem that classical computers cannot simulate quantum systems efficiently, 
Feynman [35] posed the quantum computer as a solution. Now we know 
that, in terms of computability, quantum computers and classical computers 
possess exactly the same computational power. But in terms of computa- 
tional complexity, which measures the efficiency of computation, there are 
many exciting examples confirming that quantum computers do solve cer- 
tain problems faster. The two most significant ones are Shor’s factorization 
algorithm [96] and Grover’s search algorithm [46], among other examples such 
as the Deutsch—Jozsa problem [24], the Bernstein—Vazirani problem [9], and 
Simon’s problem [97]. 

Current physical realization of quantum computers follows the quantum 
circuit model [23], instead of the QTM model. The quantum circuit model is 
another fundamental model of computation, which is equivalent to the QTM 
model [118], but easier to implement. This model shares many common fea- 
tures of classical computers. In a classical computer, information is encoded 
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Fig. 14.4 Circuit diagrams of the NOT/Hadamard/phase/CNOT/controlled-phase gate. 


in multibit binary states (0 or 1), transferred from one register to another, 
and processed by logic gates in concatenation. In a quantum computer, infor- 
mation is represented by the quantum states of the qubits, and manipulated 
by various quantum control mechanisms. Those control mechanisms trigger 
quantum operations to process information in a way resembling the gates 
in a classical computer. Such quantum operations are called quantum gates 
and a series of quantum gates in concatenation constitutes a quantum cir- 
cutt [112]. However, because of the special effects of quantum mechanics, 
major distinctions exist. 

In contrast to a classical system, a quantum system can exist in different 
states at the same time, an interesting phenomenon called superposition. 
Superposition enables quantum computers to process data in parallel. That 
is why a quantum computer can solve certain problems faster than a classical 
computer. From now on, we use the Dirac bra-ket notation. In this notation 
a pure one-qubit quantum state can be written as |¢) = al0) + b|1). Here 
|0) and |1) are the two basis states of the qubit, for example, in NMR, the 
spin-up and spin-down states, and a,b € C with |a|? + |b]? = 1. When we 
make a measurement of a qubit, the result might be either |0) or |1), with 
probabilities |a|? and |b|?, respectively. More generally, a string of n qubits 
can exist in any state of the form |) = eee W_|e), where w, € C and 
> ||? = 1. When we make a measurement on |w), it collapses to |x), one of 
the 2” basis states, with probability |2,|?. This indeterministic nature makes 
the design of efficient quantum algorithms highly nontrivial. 

Another distinctive feature of the quantum circuit is that the operations 
performed by quantum gates must be unitary (U'U = J). It is the natural 
consequence of the unobserved quantum systems evolving according to the 
Schrodinger equation. A quantum gate may operate on any number of qubits. 
Here are some examples (cf. Figure 14.4 for the circuit diagrams). 
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1, NOT gate Ap: Ao|0) = |1), Aol) = |0), or Ao = to 


2. The Hadamard gate H: H|0) = +5 ((0) + 1)), H|1) = +5 (0) —|1)), or 


Af 


3. One-qubit phase gate Rg: Re|0) = |0), Rell) = e’°|1), or 


—— | 


4, Two-qubit controlled-NOT (CNOT) gate A;: Ai|00) = |00), A1|01) = |01), 
A,|10) = 11), A,{11) = 10), or 


1000 
0100 
0001 
0010 


A, = 


5. Two-bit controlled-phase gate A;(R»), where Rg is the one-bit phase gate: 
A, (Re)|00) = |00), Ai(Re)|01) = |01), A1(Re)|10) = |10), Ai(Re)|11) = 
e’9|11), or 


0 1 
0 0 
Ai (Re) = 1 0 


The one-qubit and two-qubit quantum gates are of particular importance 
to the construction of a quantum computer, because of the following univer- 
sality result. 


Theorem 14.2.1. (DiVincenzo [2, 25]) The collection of all the one-qubit 
gates and the two-qubit CNOT gate suffice to generate any unitary operations 
on any number of qubits. 


Figure 14.5 illustrates, as an example, how to generate the two-qubit con- 
trolled-phase gate using 2 CNOT gates and 3 one-qubit phase gates. The 
controlled-phase gate is an important building block for the quantum Fourier 
transform (cf. Figures 14.14 and 14.15). 

The standard procedure of executing a quantum algorithm on a quantum 
circuit usually follows these steps. 


1. Initialize the qubits. 
2. Apply a proper sequence of quantum gates on the qubits. 
3. Measure the qubits. 


We address the details of these steps in the scenario of NMR technology. 
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Fig. 14.5 Construction of the controlled-phase gate with CNOT gates and phase gates. 


14.2.2 Realization of a Qubit 


As mentioned in Section 14.1, NMR quantum computing is accomplished 
by using the spin-up and spin-down states of a spin-4 nucleus. A molecule 
with several nuclear spins may work as a quantum computer where each spin 
constitutes a qubit. In fact, NMR. has a long history in information science. 
Back in the 1950s, nuclear spins were already used for information storage in 
computers. 

Liquid NMR receives more interest due to its mature technology and readi- 
ness for application. For now, spin-3 nuclei such as protons and '3C are pre- 
ferred because they naturally represent a qubit, but multilevel qubits formed 
by spin-n nuclei, n = 1,2,..., may provide more freedom in the future. 
Through careful design, the potential qubits or nuclei are configured with 
different resonance frequencies and can be distinguished from each other. In 
a low-viscosity liquid, dipolar coupling between nuclei is averaged away by 
the random motion of the molecules. The J-coupling (scalar coupling) domi- 
nates the spin-spin interaction, which is an indirect through-bond electronic 
interaction. Previously, a very difficult part of the system operation was to 
set the quantum system to a special state (or to initialize it). Now a very 
complicated technology has been developed to solve this problem. 

Figure 14.6 shows the structure of a trichloroethylene (TCE) molecule and 
a chloroform molecule used in NMR quantum computers. The hydrogen nu- 
cleus (proton) and two °C nuclei in a TCE molecule form three qubits which 
can be manipulated, and the chloroform molecule provides two qubits. The 
sample used by an NMR quantum computer has a large number (~10?°) of 
such molecules. This is also called a bulk quantum computer. Although most 
molecules are in a totally random state at room temperature, there are still 
a small amount of spins standing out and serving our purpose. Theoretically, 
we use a Statistical spin state called a pseudo-pure state, which has the same 
transformation property as that of a pure quantum state. 

Let |¢) = a0) + b|1) be the state of a single qubit, |0) for spin-up and |1) 
for spin-down. We also assume that a is real because only the relative phase 
is important. Thus this state can be represented using two angles 0 and w: 


0 jit 
\o) = cos 5|0) +e! sin 5|1), (14.4) 
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Fig. 14.6 The molecule structure of a candidate 3-qubit quantum system, trichloroethy- 
lene (left), and a candidate 2-qubit quantum system, chloroform. The trichloroethylene 
molecule has two labelled °C and a proton, all having one-half-spin nuclei. By consider- 
ing the static magnetic field and spin-spin interaction, its Hamiltonian can be written as 
f= at Gnibnili: B+ yey es Ij - Ji, -1j. The chloroform has one labelled 13¢ 
and one proton. 


where 6 € [0,7] and 7 € (0,27). If we think of |0) and |1) as the standard 
basis in C?, the quantum state corresponds to a unit vector in C?. 

For the study of NMR spectroscopy with many nuclei, density matrices 
are preferred and are often written as the linear combination of product 
operators [83]: 


p = 16) 
cos? $e tsin# 
- pate sin? $ | 
=I) +sin@cos wl, + sin @sin wl, + cos 61, (14.5) 


where the product operators are defined as 


1/10 ho 1 [0-3 D0 
=5 (09) = 5 [90]: W=35 |; A =5 [5-4]: ele 


They are different from the Pauli matrices only by a constant factor and 
share the similar commutative law. Upon collecting all the coefficients of I, 
I,, and I, together, we obtain a vector 


v=[sin@cosy sin@siny cosé]’, 


which is called a Bloch vector. In essence, we have defined a mapping from 
the set of unit vectors |¢) € C? to the set of unit vectors v € R3. We have 
good reasons to ignore the coefficient of Io, because it has no effect on the 
spectroscopy and remains unchanged under any unitary transformation. Each 
Bloch vector determines a point on the unit sphere, called the Bloch sphere, 
which is displayed in Figure 14.7 [69, 83]. Bloch vectors have proven to be a 
very good tool for NMR quantum operations. 
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Fig. 14.7 The Bloch sphere representation of a quantum state. 


The mapping defined above is surjective, because every point on the Bloch 
sphere gives rise to a unit vector v = [sin@ cosy sin@siny cos]? for some 
pair of (6,~). Conversely, if v(@’, ~’) = v(@,~), we get 


cos0 = =cos6’, 
sin # cosy = sin @’ cosy’, (14.7) 
sin @sinw = sind’ cosy’, 


which can be used to show that the mapping is also injective if we identify 
all pairs of (0, w) with one point and all pairs of (7, w) with another point. 
In fact, these two sets correspond to two states |0) and |1), respectively. 


14.2.3 Transformation of Quantum States: SU(2) and 
SO(3) 


When a quantum operation is applied to a quantum system, it may change 
the quantum state of the system from one to another. The representation of 
the operation depends on how the quantum state is represented. For example, 
(14.4) leads to an operator or matrix U which connects the new and old states 
of a single-spin quantum system: 


|’) = Ule), 


where |¢’) and |¢) are the quantum states after and before the operation, 
respectively. The fact that both states are unit vectors implies that U is 
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a 2 x 2 unimodular complex matrix. Moreover, U is also unitary; that is, 
U € SU(2),? a Lie group endowed with a certain topology. 

If the quantum state is represented by a three-dimensional Bloch vector, 
the effect of a unitary operation can be viewed as that of a rotation which 
rotates the Bloch sphere, and the operator is represented by a 3 x 3 real 
matrix S. If the quantum system has states v and v’ in Bloch vector form 
before and after the operation, respectively, then 


v’ = Sv. 


The matrix S$ is a proper rotation matrix; that is, S € SO(3).° It is isometric 
and preserves the threefold product. 

If both S and U represent the same physical operation, such as a trans- 
formation induced by a series of pulses in NMR, there must be a connection 
between them. One can show that there is a mapping R from SU(2) to SO(3) 
such that S = R(U), for any U € SU(2) and its corresponding Bloch-sphere 
representation S [88]. Simple computation shows that the entry of matrix 
S = R(U) at the kth row and ith column is given as 


Si = Tr(o, U I, U*), (14.8) 


where oj, are the Pauli matrices,‘ and Tr is the trace operator. It can also be 
shown that R is a two-to-one homomorphism between SU(2) and SO(3) with 
kernel ker(R) = {I, —I}. It coincides with the fact that U and —U in SU(2) 
represent the same operation because only the relative phase matters. This 
mapping is also surjective, so it defines an isomorphism from the quotient 
group SU(2)/ker(R) to SO(3). We provide a more detailed discussion about 
this isomorphism in the appendix. 

It is known that any U € SU(2) can be written into an exponential form 
parameterized by an angle @ € [0,27) and a unit vector n such that 


U(0,n) = ce 1(9/2)n-o 
ee) , - 6 6 P 
__ | cos 5 — ingsin 5 sin5(n2 + in1) (14.9) 
| sin S (ne —in,) cos g + ing sin g , 
6 


Pes 0 Ret 0 . 
= cos 5/ —isin5n-o, 


where o = [o,,0,,02]. With this parameterization of SU(2), entries of S = 
R(U) can be computed using (14.8) as 


2 SU(n) is the special unitary group of n x n matrices. An n x n matrix A € SU(n) if 
and only if A is unitary (ie., A- At = In, where At is the Hermitian adjoint of A, and 
det A = 1). 

3 §O(3) denotes the special orthogonal group of 3x 3 matrices. An nxn matrix A € SO(n) 
if and only if A is real, AA? = I,, and det A = 1. 


4 The Pauli matrices are oz = (fal, Oy= carat and oz = een 
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3 
Sij = RO )ij = cos 66; ; + (1 — cos 0)nin; + S "sin 6 eine. (14.10) 
k=1 


It should be noted now that S coincides with a rotation about the axis along 
n with an angle @ in the three-dimensional Euclidean space after comparing 
Si; with the standard formula of a rotation matrix. This interpretation is 
important in understanding the terminologies used in NMR. For example, 
the rotations around z-, y-, and z-axes (#/y/z-rotations) with an arbitrary 
angle 6 define the following three unitary operators in SU(2), respectively. 


6 “a 8 
— pda gph | SOF 2: .' Sie 
we is ee cos 2 |’ wey) 
2 2 
6  o 
cos$ —sin$ 
fone |® : aie (14.12) 
sm 5 COS 5 
—10/2 0 
. € 
Zp = e~ibo2/2 = ; A (14.13) 


14.2.4 Construction of Quantum Gates 


From Theorem 14.2.1, we know that the collection of all the one-qubit 
gates and the two-qubit CNOT gate are universal. In addition, the following 
fact [85, p. 175] holds for one-qubit quantum gates. 


Theorem 14.2.2. Suppose U is a unitary operation on a single qubit. Then 
there exist real numbers a, 8, y, and 6 such that 


U =e! ZpY,Z5. 


For example, the Hadamard gate H can be decomposed as H = e'"/?Y,/9Z,. 
Clearly, the «/y/z-rotation gates provide building blocks sufficient to con- 
struct any one-qubit unitary gate. In this section, we show how to realize 
these one-qubit rotation gates and the two-qubit CNOT gate using NMR. 
We also show how to decouple the interaction between two spins, a process 
called refocusing [85]. 


14.2.4.1 One-Qubit Gates 
A single-spin system has Hamiltonian H = —y-B, where yp is the magnetic 


moment, and 
B = Boe, + Bi(e, cos(wt) + e, sin(wt)) (14.14) 
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is the magnetic field applied. Bo, a large constant, is the amplitude of the 
static magnetic field, and B, is the amplitude of the oscillating magnetic field 
in the zy plane. When B, = 0, the Hamiltonian and Schrodinger equations 
can be obtained as ([85]) 


H= 50: (14.15) 
and 
10:|Y(t)) = Alb(t)), (14.16) 


respectively, where fh has been divided from both sides in the second equation 
and we take h away from H in the first one just for simplicity. The Larmor 
frequency wo = —Bpo7y is defined by the nuclei and the magnetic field; see 
(14.3). Assume that the initial state is |Yo) = ao|0)-+bo|1). Then the evolution 
of the quantum state of the spin and the density matrix can be solved directly 
and given as 


[w(E)) = err" abo) 
e~iwot/2 0 a9 
0 etwot/2 bo 


See, 1 O 
=e nye, F ost Ibo), 


p(t) = e @ o(O)e"4., 


This evolution is also called a chemical shift evolution, resembling the 
precessing of a magnet in a static field. Recall the Bloch vector on the Bloch 
sphere. It is exactly Zg, the rotation operator around the z-axis with 6 = wot. 

To achieve an x-rotation operator, we need a small magnetic field trans- 
verse to the z-direction to control the evolution of the quantum state. The 
Hamiltonian is given as in (14.14) by choosing B, different from zero: 


H=-p-B= o + > (a, cos(wt) + oy sin(wt)) , 
where w, depends on the z—-y plane component B, of the magnetic field, 
w, = —By,y. To solve the Schrédinger equation, we put |~(¢)) in a “frame” 
rotating with the magnetic field around the z-axis at frequency w, |@(t)) = 
e'tez/2\1)(t)). With this substitution, the Schrédinger equation (14.16) be- 
comes 
iO,14(4)) = (et7="/? He“ West/? — “a, )16(t)). (14.17) 


Using properties 


iwozt/2 —iwozt/2 


e€ axe =0z, 
elveat/2g e—wost/2 — Gg. cos(wt) — oy sin(wt), (14.18) 
erg je aH? =x sin(wt) + Oy cos(wt), 
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we obtain 


Wo -W 


JOG) Se He en /Btenen/41 4/0), (14.19) 


We know from (14.9) that this is a rotation around the axis 


n= am (24 a x). (14.20) 


1 eae ve 

An important case is wo = w, also called the resonance case where its 
name comes from the zero denominator in (14.20). By (14.19), we see that a 
relatively weak transverse magnetic field causes a rotation around the z-axis: 


I(t)) = e~i0r=t/2| g(¢)) = em tevt=/2e—irtan/214(Q)) = Zo Xleh(0)), 
(14.21) 
where Xz = e~™1!%2/2, 8 = wt. By applying another Z_9, we obtain a 
rotation Xg as desired. Because the frequency of the precession is in the 
radio frequency band, the field applied is called an RF pulse. 
When |wo — w| >> w 1, the rotation axis direction is almost along z and the 
RF pulse has no effect on it: 


[ab (t)) = ee" 14) wee 0'7=/712H(0)) = Zargelh(0)), 


thus we can tell one qubit from another because their resonance frequencies 
are designed to be different. There are still cases where the difference of 
resonance frequencies between spins is not large enough. The RF pulse may 
cause similar rotations on all those spins. To avoid or at least minimize it, 
a soft pulse is applied instead of the so-called hard pulse. It is a pulse with 
a longer time span and weaker magnetic field, in other words, a smaller w. 
This strategy makes these “close” qubits fall into the |wo — w| >> w1 case. 
If we change the magnetic field to 


B = Boe, + Bi (ez cos(wot + a) + ey sin(wot + @)), (14.22) 
the Hamiltonian will become 


H= Sa: + (62 cos(wot + a) + oy sin(wo + @)), (14.23) 


where w, is defined as before. The RF field is almost the same as (14.14) in 
the resonance case except for a phase shift. Using the same rotation frame as 
before with w = wo, we obtain 


id,|0(t)) = = (02 cos(a) + oy sin(a))|¢(t)), (14.24) 


14 NMR Quantum Computing 483 
after simplification. After time duration t, the new system state is given as 
|o(t)) = oe t1/2) (ox cos(a)+oy sin(a))t \o(0)), (14.25) 


and the evolution operator can be computed using (14.9) as 


Ups2 a= et(w1/2) (ex cos(a)+oy sin(a))t 
7 cos($)  —isin(§)e~*° (14.26) 
~ | -isin(Z)e** — cos(8) , 


where 6 = wt. This is a one-qubit rotation operator, and is sometimes called 
a Rabi rotation gate. When a = 7/2, 


(14.27) 


We have achieved a y-rotation operator just by adding a phase shift to the 
RF field. 


14.2.4.2 Two-Qubit Gates 


The construction of a two-qubit gate requires the coupling of two spins. In a 
liquid sample of NMR, J-coupling is the dominating coupling between spins. 
Under the assumption that the resonance frequency difference between the 
coupled spins is much larger than the strength of the coupling (a so-called 
weak coupling regime), the total Hamiltonian of a two-spin system without 
transverse field may be given as 


1 1 1 
H = uno; + =woo? + 51020: 


14.28 
swiol +5 i (14.28) 


where w; is the frequency corresponding to spin i, 0% is the z-projection 
operator of spin i, for i = 1,2, and J is the coupling coefficient. Take the 
chloroform in Figure 14.6, for example [16, 69]. In an 11.7 T magnetic field, 
the precession frequency of !3C is about 27 x 500 MHz and the precession 
frequency of the proton is about 27 x 125 MHz. The coupling constant J is 
about 27 x 100 Hz. Here we set B, = 0, which means no transverse magnetic 
field is applied and those terms such as o,, o, do not appear. The remaining 
terms in the Hamiltonian only contain operators a} or 02, which are commu- 
tative. Thus, we can obtain the eigenstates and eigenvalues of this two-spin 
system and we map the set of eigenstates to the standard basis of C*, as 
follows. 
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0 1 0 0 
lO0}=] 5]. l= [5]. LO=],]. l=] q]s 4-29) 
0 0 0 1 
H\00) a; koo|00), koo = Zw + tw + 4 J; 
H\01) = koi|01), kor = 2w1 — duo — 1; 
01) = Rx|01), hon = ss — hve hak 
H|10) = ky9|10), kyo = — 541 + 5W2 — 5d; 
Al11) = kyi|11), kar = — gu — gwa t 5. 


Because the matrix is diagonal, the evolution of this two-spin system can 
be easily derived as 


e *koot 
e7 tko1t 


I(t) = e"#*|y(0)) = e—ihiot \W(0)). (14.31) 


etkut 


We can also rewrite the one-qubit rotation operators for this two-4-spin 
system in matrix form with respect to the same basis: 


eo in/4 
—in/4 
e€ 
Zn/2 = eit /4 ’ (14.32) 
eit /4 
eit /4 
—in/4 
e€ 
24 Lous (14.33) 
eit /4 
1-1 
¢ 2 v2 )1.1 
Yr/2 = 2 1-1]? (14.34) 
11 
1 1 
2 _v2|-11 
Yo 5/2 i “2” 11]? (14.35) 
-11 


where Zj is the rotation operator for spin i with angle @ around the z-axis 
while keeping another spin unchanged, and all Yj are similarly defined oper- 
ators about the y-axis; see (14.12). A careful reader may raise issues about 
the one-qubit gate we have obtained in Section 14.2.4.1 because the coupling 
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between two qubits always exists and has not been considered. We need to 
turn off the coupling when we only want to operate one spin but the coupling 
is nonnegligible. This is in fact one of the major characteristic difficulties 
associated with the NMR quantum computing technology. A special technol- 
ogy called refocusing is useful. It works as follows. We apply a soft 7 pulse 
on the spare spin that we don’t want to change at the middle point of the 
operation time duration while we are working on the target spin. The effect 
is that the coupling before the pulse cancels the one after the pulse, so the 
result of no-coupling is achieved. Another 7 pulse will be needed to turn the 
spin back. All pulses are soft. 
This technology is so important that we now state it here as a theorem. 


Theorem 14.2.3. Let H = (w;/2)0} + (J/2) 0402+ A be a given Hamilto- 
nian, where A is a Hamiltonian that does not act on spin 1 and commutes 
with 02. Then the evolution operators of A and H satisfy 


eS er ee ee) 


that is, the collective evolution of the quantum system with Hamiltonian H 
and additional two X}-pulses at the middle and the end of the time duration, 
equals that of a system with Hamiltonian A (up to a global phase shift 7, or 
a factor —1). 


Proof. Assume that the time duration is t and denote U for 
UP aux te ee xe ee. (14.37) 
Note that X! = e~“(*/2)¢2 and it commutes with A which contains no oper- 
ators acting on spin 1, thus 
U = Xen il(We1/2)03+(J/2)0302)t/2 x19 i( Ww1/2)01+(4/2)0402)t/26iAt_ (14.38) 


It suffices to prove that the part before e~*“* satisfies 


B = Xie-iln/2)02+/2)0302)t/2 YL 6 i((w1/2)02H(F/2)e402)t/2 — TF (14.39) 


We first check the effect of B on the four basis vectors. We have 

B\11) = Xle-U(w1/2)02+(I/2)0202)t/2 XL e—i((w1/2)0; + (J/2)o202)t/2)1 1) 

= en (—#1+J)/4)t (4) XL e— i (w1/2)02+(J/2)0202)t/2191) 

(Zier eit) Dig let 11/4 OT) 

= (-1)?|11) 

=\hi), (14.40) 
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BIO1) = XleWt(1/2)o2 +(J/2)0202)t/2 XV —U((w1/2)02 + (J/2)o202)t/2|()1 ) 

= e~i((w1—J)/4)t (7) KL e—t((w1/2)02 + (J/2)o2 02 )t/2) 11) 
= (—t)e U(r t/t XT e—il(—wi + J)/4)E 111) 


= (—i)?|01) 
= -—|01), (14.41) 
and similarly, 
B\10) = —|10), 
(14.42) 
B\00) = —|00). 


In the computation above, we have used the fact that X} has no effect 
on the second spin and the four basis vectors |00), |01), |10) and |11) are 
the eigenstates of the operator (w,/2) ao! + (J/2) 020}. The result shows that 
B=-W—TI, and we are done. 


When the Hamiltonian is given in the form as (14.28), the above theorem 
tells us that both the chemical shift evolution (precession) and the J-coupling 
effect on spin 1 are removed and only the term (w2/2) 0? remains. We obtain 
a z-rotation of spin 2 while freezing spin 1. By combining it with several hard 
pulses, we can also achieve any arbitrary rotation on spin 2 with the motion 
of spin 1 frozen [73]. Similar computation shows that a hard 7 pulse applied 
at the middle point of the time duration cancels the chemical shift evolution 
of both spins. This can be seen by checking the identity 


e-idt/2 


iJt/2 
—iHt/2 yl y2,—-tHt/2 _ € 
e€ XX “e = oidt/2 (14.43) 


en ist/2 
Another hard 7 pulse can rotate two spins back, so we have achieved an 
evolution which has only the J-coupling effect, denoted by Ze: 
e-i0/2 
- ei8/2 
Lg = ei0/2 ’ 
e7i0/2 


and when @ = 7/2, 


(14.44) 
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Fig. 14.8 The quantum circuit used to realize a quantum controlled-not gate. 


Although we give only an example of the 2-qubit system in the above, the 
reader should note that a general method is available to reserve only the 
couplings wanted while keeping all the others cancelled for multiqubit sys- 
tems [57, 71, 73]. Combining operators in (14.32) through (14.35) and (14.44), 
we can now construct a CNOT gate as in Figure 14.8 which includes four 
1-qubit 7/2 rotations around y- or z-axes and one 2-qubit 7/2 rotation. The 
total operator, denoted by CN, can be computed as 


1 
it 
O11]? 
10 


ON = Zi Yj n/[oln[2¥ajg =O © (14.45) 


which is a CNOT gate up to a phase of —7/4 [69]. 

We have shown how to construct one-qubit gates and the two-qubit CNOT 
gate using the NMR technology. The simple pulse design works fine in ideal 
situations. In practice, errors arise from various factors. Decoherence causes 
the lost of quantum information with time. Thus, all operations should be 
completed within a short time, roughly constrained by the energy relaxation 
time 7; and the phase randomization time T>. Again, take the chloroform for 
an example. For protons, T; ~ 7 sec and T2 * 2 sec; for carbons, T; + 16 sec 
and T> ~ 0.2 sec [16, 69]. The pulses have to be short enough so that all the 
pulses can be jammed in the time window. Ideally, a pulse can be completed 
quite fast, but this may incur undesirable rotations in other qubits because 
the frequency bandwidth is inversely proportional to the time length of the 
pulse. A shorter and stronger pulse will have a wider frequency band that may 
cover the resonance frequency of another spin, called cross-talking. It should 
also be noted that both T, and T> are defined and measured in a simplified 
situation, and they can only be used as an approximation of the decoherence 
rate for the quantum computation. Coupling is also a problem that makes the 
pulse design much more complicated. Finally, any experimental facility is not 
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perfect, which may introduce more errors. Typical error resources include 
inhomogeneities in the static and RF field, pulse length calibration errors, 
frequency offsets, and pulse timing/phase imperfections. 

If the quantum circuit can be simplified and the number of gates needed 
is reduced, the requirements on the pulses can be alleviated. Mathematicians 
are looking for methods to find time-optimal pulse sequences [43, 60, 61, 99], 
with the goal of finding the shortest path between the identity and a point 
in the space of SU(n) allowed by the system and the control Hamiltonians. 
In addition, NMR spectroscopists have already developed advanced pulse 
techniques to deal with system errors such as cross-talking and coupling. They 
turn out to work well and are now widely used in NMR quantum computation. 
Such techniques include composite pulses [21, 36, 72, 54, 55, 108] and pulse 
shaping. The latter consists mainly of two methods: phase profiles [89] and 
amplitude profiles [39, 66]. 


14.2.5 Initialization 


An NMR sample eventually will go into its equilibrium state when no RF 
pulse is applied for a long time. Then the density matrix is proportional 
to e~#/kT according to the Boltzmann distribution, where k = 1.381 x 
10-78 J/K and T is the absolute temperature. Normally, the environment 
temperature is far larger than the energy difference between the up and down 
states of the spin, and H/kT is very small, about 1074. We also make the 
assumption that the coupling terms are small enough compared with the 
resonant frequency, thus we can make a reasonable approximation of the 
equilibrium state density matrix of a system with n spins: 


e-H/kT 


pa= ir(en HIF) ae io lao! + 6202 +--+ +€,0%). (14.46) 

In the four operators appearing in the density matrix (14.5), only those 
with zero traces can be observed in NMR. The operator Jo is invisible, and 
moreover, it remains invariant under any unitary similarity transformation. 
Therefore, we only need to take care of the zero-trace part of the initial 
density matrix, noting that only that part (called the deviation) is effective. 
Most algorithms prefer an initial state such as 


l-e 
Qn 


po = I+.€|00---0)(0--- 00], 
which is an example of the so-called pseudo-pure states, corresponding to the 
pure state |00---0). 

To initialize the system to a pseudo-pure state as above, we may use a 
scheme called averaging. Let us explain this for a 2-spin system. Suppose we 
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have three 2-spin subsystems with density matrices 


a000 a000 a000 
0600 0¢00 0d00 

P\=lo0c0|? @=|loodol> "=|loo0b0\? 447) 
000d 0008 000c 


respectively, where a, b, c, and d are nonnegative, anda+6b+c+d= 1. 
These are three diagonal matrices with three of their diagonal elements in 
cyclic permutation. 

Now, we mix these three subsystems together (for an n-qubit system, we 
may have 2" — 1 subsystems) and assume that the three subsystems have the 
same signal scale. Because the readout is linear with respect to the initial 
state, we are in fact working on a system with an effective initial density 
matrix 


F 3a 
1 il b+ce+d 
33 b+c+d 
‘7 b+c+d 
4a-1 0 0 0 
b+ct+d 1 0 0 0 0 
— I 14.48 
3 0 0 0 0 ; ( ) 
0 0 0 0 


which is a pseudo-pure state corresponding to |00--- 0). 

Various methods have been developed to achieve this effect of averaging. 
Because 1, 2, and ps differ only by a permutation of the diagonal elements, 
a sequence of CNOT pulses can be used to transform one to another. In most 
cases, we only have one sample; the same algorithm can be repeated on that 
sample three times but with different initial states p1, p2, and p3, respectively. 
At last, after all three outputs are obtained and added together (average), we 
achieve the same result as what we will get when the algorithm is employed 
on a system with the expected initial state |00---0). This is called “temporal 
averaging” [63]. Gradient fields can also be used to divide the sample into 
different slices in space which are prepared in different initial states, and the 
averaging is realized spatially, called “spatial averaging” [19]. The number 
of the experiments and pulses needed grows very large when the number of 
qubits increases. For example, nine experiments are combined in order to 
prepare one pseudo-pure state for a 5-qubit system and 48 pulses are used to 
form one pseudo-pure state in a 7-qubit system [41] after modifications such 
as logical labeling [42, 106] and selective saturation [62]. 
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14.2.6 Measurement 


An NMR computer differs from other quantum computers in that it works 
on an ensemble of spins instead of just a single one. It produces an observable 
macroscopic signal that can be picked up by a set of coils positioned on the 
x-y plane, as shown in Figure 14.3. The signal measures the change rate of 
the magnetic field created by a large number of spins in the sample rotating 
around the z-axis, called free induction decay (FID). Due to relaxation, peaks 
of the Fourier transform of the signal, or spectra, have width. However, we 
do not need to worry about that because it does not make any substantial 
difference in our discussion here. One disadvantage is that the readout from 
NMR is an average of all the possible states, in contrast to most existing 
quantum algorithms that ask for the occurrence of only a single state. But 
it is possible for one to modify ordinary quantum algorithms to make NMR 
results usable. 

The magnetization detected by the coil in Figure 14.3 is proportional to 
the trace of the product of the density matrix with o4 = 0, + toy: 


M, +iMy = nV (Me + ify) = nV yA Tr(p(on + toy)), (14.49) 


where ¥ is the magnetogyric ratio as in (14.3) and p is the density matrix. 
When the external RF magnetic field is removed, the density matrix will 
change according to the system’s Hamiltonian as we discussed earlier. If we 
decompose the density matrix into a sum of product operators as in (14.5), 
only I, and I, contribute to the readout. We cannot “see” the coefficients 
of Io and I,. Recall (14.18): if a one-spin system begins from density matrix 
po = Ip + sin @ cos wz + sin @ sin wl, + cos 6I,, the magnetization will rotate 
with the resonant frequency as 


M, + iM, = CTr(e*™* ppe’#*a4) 

= CTr(e~*#* (Ip + sin() cos() Ix 
+ sin(0) sin(w)I, + cos(0)I,)e"#*o4) 
= C'Tr((sin 6 cos w(cos(wt)Iz + sin(wt)I,) 
+ sin @ sin )(cos(wt)I, — sin(wt)Iz))o+) 
=C sing bt), 


(14.50) 


where C = nV yh. This rotating magnetization will introduce an oscillating 
electric potential in the receiver coils, which will be processed by a computer 
to generate the spectra. Note that the signal is proportional to sin @. If an x 
rotation with angle 7/2 is applied on the spin before the measurement, the 
magnetization will become 


5} 
M,+iMy = 2 cisind — icos@)e™. 
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For simplicity, we have chosen vw = 0. The imaginary part is proportional to 
the population difference: 
0 0 
cos 6 = cos” — — sin? =. 
er 
Computation of a two-spin system is complicated, so we only give some 
partial results here. The purpose is to point out what methodology is used. 
We still use the basis given by (14.29) and the Hamiltonian in (14.28). The 
system begins from a density matrix as 


P11 P12 P13 P14 

es P21 P22 P23 P24 | (14.51) 
P31 P32 P33 P34 
P41 P42 P43 P44 


The operator 04 is a summation of operators from the two subsystems: 


o, =o, +04 

0220 

0002 (14.52) 
0002] — 

0000 


The magnetization in the zy plane is composed of four frequencies: 


M, +iM, = C Tr(e*"* ppe*#*o,) 


i(witJ)t t(wi—J)t t(wo—J)t Heated i 


(14.53) 


= C (p31e + paze + page + poe 


The spectrum has two pairs of peaks, one pair around the precession fre- 
quency w; and another pair around w 2. See Figure 14.9. The splitting is a 
result of coupling. If the system has more than two spins, the coupling will 
split a peak into up to 2”~! peaks where n is the number of spins. We also 
combine all the constants in C’ to make the formula concise. Only four of 
the elements out of the density matrix appear in this spectrum, so we need 
to design certain control pulses to move the expected information to these 
four positions where numbers can be shown via free induction signal. If mul- 
titests are allowed, theoretically, all the elements of the density matrix can 
be retrieved [15, 14]. It is also possible to transport the desired information 
(computational results) to the four positions where the observer can see. 

A typical pulse used in reading out is a hard X,/2 pulse which rotates all 
the spins about the x-axis with angle 7/2. Let us still use two-spin systems as 
an example. The operation is the tensor product of two x-rotation operators; 
that is, X,/o = Xj jade jo: The imaginary parts of the four effective elements 
of the density matrix p’ after the operation, utilizing the fact that the density 
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Fig. 14.9 Simplified stick spectra of a two-qubit molecule. The two dotted lines show two 
peaks at w and we, respectively, when no coupling is applied (J = 0). After coupling, 
every peak is split into two small peaks with the intensities reduced to half. 


matrix is Hermitian, are 


Im(p31) = (33 + p44 — P11 — p22 — 2 Im(po21) — 2 Im(p3a)), 

Im(p42) = $(933 + p44 — pir — P22 + 2Im(p21) + 2Im(p34)), (14.54) 
Im(p43) = + (p22 + paa — Pir — p33 + 2Im(p31) + 2Im(poa)), , 
Im(941) = 7(p22 + psa — pri — p33 — 2Im(p31) — 2Im(pe,)) 


Find the sum of Im(p5,) and Im(p4.) and that of Im(p43) and Im(4)): 


Im(p31 + P42) = —5(011 + P22 — 033 — p44), 


1 
2 

: Ge as Ss : (14.55) 

m(p43 + P21) = —5(P11 — P22 + P33 — pad): 


Because what the coils pick up is the change rate of the magnetic field rather 
than the magnetic field itself, the imaginary parts we have listed above are 
reflected in the real part of the spectra. The computation above shows that 
the sum of the real parts of each pair of peaks in the spectra is proportional 
to the population difference between the spin-up and the spin-down states of 
the corresponding spin. 


14.3 Solid-State NMR 


Liquid NMR, discussed in Section 14.2, has several constraints that make 
a liquid NMR quantum computer not scalable. At first, as the result of 
the pseudo-pure state preparation, the signal—noise ratio decreases exponen- 
tially when the number of qubits increases, limiting its ability to realize more 
qubits. Another difficulty arises when we want to control the system as ac- 
curately as desired. Because the range of the chemical shift is limited by 
nature, the number of qubits represented by the same type of nuclei, such 
as carbon, is constrained as the resonance frequency gaps between any two 
qubits must be large enough so that we can distinguish the qubits easily and 
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control them with great precision. It is estimated that a quantum computer 
realized by liquid-state NMR can have at most 10 to 20 qubits. 

Solid-state NMR. has the potential to overcome many of the problems of 
its liquid-state counterpart as in the preceding paragraph. These advantages 
are derived partly from the lack of motion of the molecules and partly from 
the ability to cool to low temperatures. As with many potential solutions, 
there are tradeoffs to consider. Here we summarize. 


1. At low temperatures, near or below that of liquid helium, it is possible 
to initialize electron spins using the thermal Boltzman distribution. Nuclear 
spins do not become significantly oriented until much lower temperatures 
because of their ~1000 times lower energies, but there are existing pulse RF 
sequences that can transfer an electron spin orientation to nearby nuclear 
spins using their mutual spin-spin interaction. In principle, this solves the 
problem of qubit initialization. In practice, the thermal initialization process 
can be slow because it depends on the electron spin population lifetime. It is 
possible to find systems with short electron spin lifetimes, but this will tend 
to result in faster decoherence of the nuclear spins, because they must be 
coupled to the electron in order to initialize in the first place. 

2. Because the molecules in a solid are usually not tumbling, the dipole 
coupling between nearby spins does not average out. This has the advantage 
of making multiqubit gates faster, because the dipole coupling is much larger 
than the scalar coupling. The orientation-dependent chemical shifts also do 
not average out, in principle making individual qubits easier to address so 
that more qubits can be used. Here, it should be noted that custom molecules 
containing electron spins [111] can be used to enhance this effect. There is a 
tradeoff to consider, in that the faster interaction with nearby spins provided 
by dipole coupling can also lead to faster decoherence times. 

3. Spin lifetimes in solids can be much longer than in liquids. Lack of 
molecular motion eliminates the spatial diffusion of spins which is a problem 
in liquid NMR for times in the range of milliseconds or longer [34]. Phonons 
can cause decoherence in solids at room temperature, but this can be strongly 
suppressed at temperatures achievable in liquid helium. It is not unusual to 
see spin population lifetimes of minutes in solids, especially at low tempera- 
tures. Unfortunately, spin coherence times are usually somewhat shorter due 
to dephasing caused by mutual spin flips through the strong dipole coupling. 
To eliminate this decoherence mechanism, there are two main approaches. 
One is to disperse the active molecule, as a dopant in a spin-free host. Ac- 
tually the host does not need to be completely spin-free provided its spins 
are far enough off resonance with those of the active molecule. Another tech- 
nique is to use stoichiometric materials consisting of relatively large unit cells 
containing many spin-free atoms. The idea for both these approaches is to 
keep the active nuclei relatively far apart, except for nearest neighbors. 

In addition to the above differences, nuclei with nonzero spin in solid state 
can also be used for quantum computation [67] and manipulated similarly 
to the liquid-state NMR. Because all the nuclei are fixed in space, a static 
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magnetic field with a strong gradient in one direction separates the nuclei 
into different layers along the direction. Every layer of nuclei can be regarded 
as a qubit and the qubits have different resonance frequencies as the magnetic 
field is different from one layer to another. Readouts also can be made to take 
advantage of the bulk quantum computer much as with the liquid NMR. 
The signal is picked up using methods such as magnetic resonance force 
microscopy. 

There are two types of methods to make such a nuclei arrangement. Crys- 
tal, such as ceri'um-monophosphide (CeP), is a natural choice, where the 1/2 
spin 3!P nuclei form periodical layers in the crystal with interlayer distance 
about 12 A [45, 116]. Another method is to grow a chain of ?°Si that has 1/2 
spin along the static field direction on a base of pure 7°Si or °°Si which are 
both 0 spin nuclei [1, 68]. The last one combines the mature crystal growth 
and processing technology for silicon from the semiconductor industry. Liq- 
uid crystal [117] or solid-state samples [70] are also candidates for realizing 
an NMR quantum computer. 

Recently, there has been considerable progress made in the area of opti- 
cally addressed spins in solids. As a result some highly scalable designs have 
recently come forward that have the potential to eliminate all of the lim- 
itations of NMR. Aside from potentially solving NMR’s problems, optical 
addressing has the important advantage that it would provide an interface 
between spin qubits and optical qubits, which is essential to interface with 
existing quantum communication systems, and for quantum networking in 
general. 

Optically addressed spins are better known in the literature as spectral hole 
burning (SHB) materials [81]. Most of these are dopant—host systems that 
exhibit strong zero-phonon optical absorption lines at low temperature. Due 
to the inherent inhomogeneity of dopant—host systems it is often found that 
this optical zero-phonon linewidth is much larger than that of the individual 
atoms. Furthermore, when these transitions are excited with a narrowband 
laser, the resonant atoms can be optically pumped into a different ground 
state, making the material more transparent at the laser frequency. This 
is known as burning a spectral hole, hence the name SHB. In many SHB 
materials, the optical pumping is into different ground-state spin sublevels, 
and hence the hole burning process can initialize spin qubits, as illustrated in 
Figure 14.10. This type of spin qubit initialization can be much faster than 
Boltzmann initialization, especially in spin systems with long spin population 
lifetimes, because the tradeoff between spin lifetime and initialization speed 
is removed. 

In addition to optical initialization of spins, the hole burning process can 
also be used to read out the spin state of the qubits. This happens when 
the quantum algorithm returns some of the spin qubits to the state that 
was initially emptied by the laser, resulting in a temporary increase in laser 
absorption and/or fluorescence that is proportional to the final population of 
this spin state. Of course, the readout process also reinitializes, so one must 
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Fig. 14.10 (a) The signature of spectral hole burning is a narrowband dip in the optical 
absorption spectrum. (b) This dip occurs when an optical laser bleaches out an ensemble 
of atoms at a particular transition frequency. (c) In the case when bleaching is due to spin 
sublevel optical pumping, it can be used to initialize qubits. 


take care to work with a large enough ensemble to achieve the desired readout 
fidelity. In general, optical readout is orders of magnitude more sensitive than 
the typical NMR coil, and so it is possible to work with small ensembles 
consisting of very dilute dopant—host systems that can have very long spin 
coherence lifetimes. 

Spin qubit coherence lifetimes in SHB materials can be lengthened by a 
variety of techniques. In dilute dopant—host systems, the choice of a spin-free 
or low-spin host has the largest benefit. Examples include praseodymium [50] 
or europium [32] doped in an yttrium-silicate host (Pr: YSO or Eu: YSO) and 
nitrogen-vacancy [110] (NV) color centers doped in diamond; see Figure 14.11. 
In Pr: YSO only the yttrium host nuclei have spin but the magnetic moment 
is very weak. In NV diamond, the only host spins are ~1% abundant !2C 
which can be virtually eliminated with isotopically pure material. In dopant— 
host systems dephasing due to host spins is reduced by the so-called frozen 
core effect [101], wherein the magnetic field generated by the active (qubit) 
spin system tunes nearby host nuclei out of resonance with the rest of the 
crystal up to a distance that defines the frozen core radius. This suppresses 
the energy-conserving mutual spin flips that are the main source of spin 
decoherence. 

In Pr: YSO the spin Hamiltonian is given by [77]: 


22 = —- —_— 25 _— 
H=B- (93:37) -B+B-(qwE +2Aygsup A) I+ (A¥ A + TQ) 1, 
(14.56) 
—> 
where the tensor A is given by 
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Fig. 14.11 (a) Spin sublevels of nitrogen-vacancy (NV) color center in diamond. (b) Spin 
sublevels of Pr: YSO. 


2J+1 
= (0| Ja|n)(n| Ja|0) 
Aog = b> AR (14.57) 
n=1 ? 


In (14.56), F is the 3x3 identity matrix, B is the magnetic field, I is the 
nuclear spin vector, gz is the Lande g, yy is the nuclear magnetogyric ratio, 
and Aj, is the hyperfine interaction. The term I- T’ g-I describes the nuclear 


electric quadrupole interaction and Al AI is the second-order magnetic 
hyperfine or pseudoquadrupole interaction. 

Recently, a spin coherence lifetime of 1/2 minute has been observed in 
Pr: YSO [37]. This impressive result is made possible by combining two tech- 
niques. The first technique involves magnetically tuning the qubit spin to a 
level anticrossing [38]. This is common in systems with spin 1 or larger. Near 
such an anticrossing there is no first-order magnetic Zeeman shift. Conse- 
quently, spin flips of nearby host and active spins, which ordinarily introduce 
coherence by perturbing the local magnetic field of the qubit, no longer have 
a first-order effect. The complication is that the magnetic field is a vector 
so that the level anticrossing must exist in all three directions. Nonetheless 
such global level-crossings were found in Pr: YSO and were used to lengthen 
the qubit spin coherence lifetime by orders of magnitude. More important, 
the residual spin decoherence was found to decay as a quadratic exponential 
in time, meaning it decays as e7 ¢/ 7)” This is critical because most quan- 
tum error correction schemes require the short-time decay to be slower than 
the usual linear exponential decay. Because this condition was satisfied in 
Pr: YSO, a version of bang-bang error correction was successfully applied to 
give the observed half-minute coherence times. 

Manipulation of spin qubits in SHB materials is generally done using RF 
coils similar to those used in liquid NMR. Recently, optical Raman transitions 
have been explored as an alternative to this, in which case the spin-qubits 
are manipulated by lasers instead of an RF coil. The advantages of this are 
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twofold. First, the gate time can be made faster because it depends on the 
optical Rabi frequency, rather than that of the spin transition. One reason for 
this is that spin transitions are generally magnetic dipole allowed transitions, 
whereas optical transitions are often electric dipole allowed. Another reason 
is that it is often easier to insert strong laser fields into a cryostat than 
strong RF fields. Second, the selectivity of qubit excitation can be improved 
considerably because only spins with the correct optical transition frequency 
and spin transition frequency are manipulated. Additional spatial selectivity 
exists because the optical laser beams can be focused down to microns, and 
only the illuminated part undergoes qubit manipulations. This is especially 
important for algorithms like those designed for a Type II quantum computer; 
see Section 14.5. 

The real power of optically addressed NMR lies in multiqubit manipu- 
lations. The optical “handle” allows several options to increase scalability. 
First, the relatively long range of optical interactions frees NMR from the 
restrictions imposed by near-neighbor interactions. An example of this is an 
ensemble-based two-qubit gate demonstration in Eu: YSO involving ions sep- 
arated by ~100 nanometers [79], which is orders of magnitude larger than 
distances required by conventional NMR. In this demonstration, a series of 
optical pulses refocuses a “target” optical qubit with a different phase de- 
pending on whether the “control” qubit is excited. Because these qubits are 
defined only by their transition frequency, neither the exact location nor 
number of spins located in between is unimportant. This demonstration also 
illustrates the interesting fact that optical transitions in some SHB materials 
have a coherence lifetime that is similar to that of many room temperature 
NMR transitions. 

In principle, long-range optical interactions such as in the Eu: YSO exam- 
ple are scalable. In practice, however, the Eu: YSO demonstration experiment 
is not very scalable because well-defined pairs of qubits must be distilled out 
of a random ensemble [94], and this incurs an exponential penalty with num- 
ber of qubits. To make this technique more scalable one approach would be to 
apply it to special solid-state pseudo-molecules. These pseudo-molecules exist 
in a number of stoichiometric crystals, the most interesting of which are those 
containing europium, for example EuVO, or Eu203 because of their narrow 
optical transitions at low temperatures [48, 78]. In these pseudo-molecules, 
localized defects have a large effect on the optical transition frequency of the 
Eu ions. Up to 50 optical transitions can be easily resolved in these materi- 
als. Assuming that all the defects are identical, each optical transition would 
correspond to a Eu spin system in a well-defined location near the defect 
center, thereby producing a pseudo-molecule. By using the long-range opti- 
cal coupling demonstrated in Eu: YSO, one could in principle construct up to 
a 50-qubit quantum computer without most of the usual scaling limitations 
of NMR. 

To achieve scalability beyond 100 qubits, single-spin manipulation is pre- 
ferred. The excitation and especially detection of single spins in a solid is a 
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very active area of research. Much of this research is based on a proposal 
to build a quantum computer using qubits consisting of nuclear spins of a 
phosphorus atom implanted in a silicon host [59]; see Figure 14.12. 34P nu- 
clei in spin-free ?Si have a spin population lifetime on the order of hours 
at ultracold temperatures, with coherence lifetimes on the scale of tens of 
milliseconds so far. Single-qubit manipulations would be done with the usual 
NMR pulse sequences. However, to avoid driving all the qubits at once, an 
off-resonant RF field is applied and the active qubit is tuned into resonance 
when desired by using the interaction with its electron spin. The electron spin 
in turn is controlled by distorting the P electron cloud with a voltage applied 
to a nearby gate, called the A-electrode. To achieve two-qubit logic, the elec- 
tron clouds of two neighboring P atoms are overlapped using a J-electrode. 
The resulting exchange interaction between the two electrons can then be 
transferred to the P nuclei using RF and/or gate pulse sequences. Because 
the P atom has electron spin, Boltzmann initialization can be used, although 
a number of faster alternatives such as spin injection are being explored. For 
readout, the nuclear spin state is transferred to the electron spin which in 
turn is converted into a charge state via spin exchange interactions with a 
nearby readout atom. The charge state is then detected with a single-electron 
transistor. 
The Hamiltonian for P in Si is given by [59]: 


A = pp Bos — gnunBoy + Ao®-o”, (14.58) 


where {1p is the Bohr magneton, 4, is the nuclear magneton, gy, is the nuclear 
g-factor, B is the applied magnetic field (assumed parallel to z), and o are 
the Pauli matrices, with superscripts e and n for electron and nuclear. For 
two coupled qubits the Hamiltonian becomes: 


H = H(B)+ Ajo" - 0 + Aga - a + Ja” - 0, (14.59) 


where H(B) is the magnetic field part and the superscripts 1 and 2 refer to 
the two spins, 
ee per 
7 Bae eae (=) ef ?r/ap) (14.60) 
EaB aB 
where r is the distance between P atoms, ¢ is the dielectric constant of silicon, 
and ap is the effective Bohr radius. 

Many of the more challenging operations proposed for the P in Si quan- 
tum computer have recently been demonstrated in quantum wells/dots using 
individual electrons and/or excitons [102]. In most of these experiments elec- 
tron spins, rather than nuclear spins, are the qubits. Unfortunately, these 
experiments are usually done in GaAs which is not a spin-free host and so 
decoherence is a problem. Spin-free semiconductor hosts exist, but fabrication 
in these systems is not yet as mature (except for silicon). 
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Fig. 14.12 Operation of the P doped Si quantum computer. (a) Single qubits are tuned 
into resonance with applied microwave field using voltages on A-electrodes to distort weakly 


bound electron cloud. (b) Two-qubit gates are enabled by using voltages on J-electrodes 
to overlap neighboring electron clouds, thereby switching on spin exchange coupling. 


Although the exciting prospect of single electron and/or nuclear spin de- 
tection is currently a technical challenge for most solids, it was done long 
ago in nitrogen-vacancy (NV) diamond [47]. In this optically active SHB ma- 
terial, single-electron spin qubits are routinely initialized and read out with 
high fidelity at liquid helium temperature. In fact, the fidelity is so high 
that it begins to compare to trapped ions [52]. The electron spin coherence 
has also been transferred to nearby nuclear spins to perform (nonscalable) 
two-qubit logic [53]. Single-qubit logic is usually done with RF pulses, but 
optical Raman transitions between spin sublevels have been observed in NV 
diamond under certain experimental conditions [49]. Two-qubit gates can be 
performed using the electron spin coupling between adjacent qubits. Initial- 
ization of such a two-qubit system consisting of a NV and nearby N atom has 
recently been demonstrated using optical pumping. Scalability of this system 
can be achieved using an electron spin resonance (ESR) version of a Raman 
transition to transfer this spin coupling to the nuclear spins [113]. This ESR 
Raman has already been demonstrated for single NV spins. 

The spin Hamiltonian for NV diamond is given by [92] 


H = D(S? - =8) + BBY S+SAI+ P(?2 — 5?) (14.61) 


where D is the zero-field splitting, B is the applied magnetic field, A is 
the electron—nuclear hyperfine coupling tensor, S' and J are the electron and 
nuclear spin vectors, and P is the nuclear quadrupole contribution. 

With optical Raman, it should be possible to use long-range optical dipole— 
dipole coupling [80], or eventually cavity-based quantum electrodynamic 
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Fig. 14.13 (a) Two distant qubits coupled by vacuum mode of cavity using cavity QED. 
(b) Possible implementation of a multiqubit bus using a photon bandgap cavity in NV 
diamond. (c) Illustration of the use of optical Raman with cavity QED to couple spectrally 
adjacent qubits. 


(QED) coupling, to perform two-qubit spin logic; see Figure 14.13. If suc- 
cessful, this will be highly scalable because the optical transition frequency 
can be tuned where desired using dc electric Stark shifts created by gate elec- 
trodes (A-electrodes). By requiring both optical laser frequency and electrode 
voltages to be correctly tuned, qubit excitation becomes very selective and 
two-qubit coupling only exists when needed, in contrast to the usual case in 
NMR. 

Electron spin population lifetimes up to minutes have so far been ob- 
served at low temperature, with coherence lifetimes up to 0.3 milliseconds 
at room temperature [52]. Interestingly, the optical initialization and read- 
out still works at room temperature (although with less fidelity), as well as 
the ESR Raman transitions. This raises the intriguing question of whether 
a room-temperature solid-state NMR quantum computer can eventually be 
built using NV diamond. 
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14.4 Shor’s Algorithm and Its Experimental Realization 


Through the rest of the chapter, we describe two applications of the NMR 
quantum computer: Shor’s algorithm and a lattice algorithm. Entangled 
states are extremely important in quantum computation. Entanglement, to- 
gether with superposition, gives a quantum computer the power to perform 
massively parallel computation and thus makes it particularly suitable for 
computing certain complex problems. Shor’s algorithm for the factorization 
of integers aimed at decryption is a special example of a “killer ap” of quan- 
tum computing [30, 96]. Recently, a successful experiment has shown the 
potential capability of the implementation of Shor’s algorithm, although it is 
still very simple and tentative. In [111], Vandersypen et al. factor 15 into 3 
times 5. That work has shown the liquid NMR quantum computer to be the 
most successful quantum computer so far. 


14.4.1 Shor’s Algorithm 


It is not difficult to factor a composite integer (i.e., nonprime) into prime num- 

bers when that integer is small, but the computation burden grows rapidly 

when the number increases. The currently most efficient algorithm, the num- 
2/3 


: : . : : 1/3 
ber field sieve, requires an exponential running time es” “(oes logn) 


where n is the number to be factored and clearly logn is proportional to 
the number of the bits needed to store this number. This makes it practi- 
cally impossible to factor a large number using a classical computer. This 
difficulty is used to construct several cryptosystems, such as the RSA public 
key cryptosystem [95]. Peter W. Shor has shown that this problem can be 
solved in polynomial running time instead of exponential time by using the 
quantum computer. A more accessible account of Shor’s algorithm is given 
by Lomonaco [75]. 

Let n be an odd integer to be factored, and choose another random integer 
x less than n. We require x to be coprime with n; otherwise, we find a factor 
of n immediately by the Euclidean method. It is then known that function 
f(s) = 2° mod n is periodic. The period of f (and also of x) is the smallest 
integer r such that x” = 1 mod n. For example, when n = 15 and x = 3, the 
moduli of x*, with s being 1, 2, 3, ..., are 3, 9, 12, 6, 3, 9, 12, 6,..., and the 
period is 4. 

Now we check the period r. If r is even, r = 2t, then x74 —1 = (a'+1)(x' — 
1) = 0 mod n, so either x*—1 or +1 has a common factor with n. A classical 
computer can use the Euclidean algorithm to compute the greatest common 
divisors, denoted as gcd(x* +1,n) and gcd(a* — 1,7), in polynomial time. It is 
possible that we only obtain the trivial factors 1 or n using the x we choose. 
This happens only when x‘ = —1 mod n, because x* — 1 = 0 mod n cannot 
happen with r being already the smallest integer such that 2” = 1 mod n. 
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Fortunately it has been proved that the probability to meet such a bad z is 
at most 1/2*, where k is the number of distinct prime factors of n. Because 
k is at least 2, the probability is still large enough for us to find a good gz, 
which has an even period r and 2’ 4 —1 mod n. 

A quantum computer can find the period r because of the speedup afforded 
by the quantum Fourier transform (QFT). Let us have two b-qubit registers. 
We select b large enough such that we can observe many periods. At the 
beginning, we set the two registers to state |0). Then we randomize the first 
register to a new state 


Yi) = wy 3 |k)|0), (14.62) 


where S$ = 2°, the number of the total b-qubit states of the first register, with 
b large enough such that 2n? > S$ > n?. 

We now design a certain series of pulses to compute f(k) = x* mod n, and 
change the quantum state to 


v2) = 23 |) f(k (14.63) 


Now, we apply QFT [90] to the first register in (14.63), which is a unitary 
transform mapping every |k) to another state: 


S-1 
1 
k) > So erry), (14.64) 


Then the quantum state of the system changes to 


1 S-1 S-1 
ls) = 5 », |u) » ge Thl S| FURY). (14.65) 


Assume that f(k) has period r, and we write k = d+ jr such thatO<d< 
r, where d is the remainder of k after it is divided by r and j ranges from 0 
to A, the largest integer such that Ar < S. This way, we can write |wW3) as 


A 


bs) = 5 >> |) 3 Fe er Se Tae, 


j=0 


where I(a4,j<3) = 1 when d+rj < S, and 0 otherwise. If S = (A+ 1)r, 
I(a4rj<s) = 1 for every d and j. If S 4 (A+1)r, it is still reasonable to ignore 
the difference and let I(q4,j;<s) = 1 everywhere because we have chosen S 
large enough. In this case, we let 
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thus our quantum state is now 


S-1r-1 


Is) = = Do bue/S up| f(A). 


u=0 d=0 


We can now measure the first register, and we want to find such a wu, for 
which there is an / satisfying 


1 
<—. 
~ 28 


ul 


5 (14.67) 
Te 


There are about r such us, and it has been estimated that the probability to 
find such a wu is at least 0.4 [90]. Because 1/25’ < 1/2n?, and we know that 
r <n, there is at most one fraction k/r satisfying the condition and we can 
use continued fraction expansions to find the fraction. If k and r are coprime, 
we obtain r as the denominator of the fraction. If not, we only find a factor 
of r. If r is odd or «"/? does not give a useful result, we choose another x and 
try again. It may be necessary to try several (of the order O(log log n)) times 
until r is successfully found, but the overall running time is still reasonable. 


14.4.2 Circuit Design for Shor’s Algorithm 


Before we introduce the experiment by Vandersypen et al. [111], we extend 
the above discussion a little further to the case when r divides S. Now S/r 
becomes an integer and (14.66) always holds so that S doesn’t have to be 
very large. Moreover, (14.67) becomes an identity 
1-S 
u=—;3 (14.68) 


‘a 
that is, r is the denominator of the fraction u/S after canceling the common 
factor between u and S if ] and r are coprime. The integer 15 falls into this 
situation. The possible x can be 2, 4, 7, 8, 11, or 13. When we choose x to 
be 2, 7, 8, or 13, the period r is 4. In other cases, r is 2. The period r divides 
S = 2° in both cases. Only two qubits at most are required to compute one 
period of f. In the experiment, three qubits are used to obtain more periods. 

Vandersypen et al. used liquid NMR to realize Shor’s algorithm in factoriz- 
ing 15. The sample in the experiment is a custom-synthesized material whose 
molecules have five !9F and two '8C, so it has seven qubits ready for use. 
Those seven qubits are divided into two registers, three to store the number 
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k (the first register, represented by |k2kiko)) and four to store the modular 
exponentiation y (the second register, represented by |ysy24y1Yo)); see Figures 
14.14 and 14.15. The total Hamiltonian is 


7 
H= Gu iOx + So andi; otal 
i=1 


V<j 


Each run of the experiment consists of four steps. In the first step, the 
sample is initialized to a certain pseudo-pure state; in the second step, a series 
of specially designed pulses is applied to realize the computation of modular 
exponentiation; in the third step, QFT is applied to the first register; finally, 
the period was obtained through the reading of the spectrum. The system 
begins from thermal equilibrium, where the density matrix is given by po = 
e H/kT ~ T — (H/kT). A suitable initial pseudo-pure state |y,) = |0000001) 
is obtained by the temporal averaging method. 

Although it is difficult to design a general circuit for the modular exponen- 
tiation, it is easy to “hard-wire” for this special case in consideration. As the 
exponent k can be written as k = kj + 2k, +4k2, we can change the modular 
exponential z* mod 15 into successive operations of modular multiplications 
2'ki with i = 0, 1, 2, applied to the second register y beginning from 
y=l. 

When i = 0, y-2 = « = 1+ (a#—-1), so the multiplication is actually a 
controlled-addition with (#—1) in the case kp = 1. For x = 7 = (0111)a, it is 
equal to flip the states of y, and y2 (y = (0001)2 before the multiplication). 
For x = 11 = (1011)2, the same reasoning shows that we only have to flip the 
states of y3 and y,, depending on ko. Gates A and B in Figures 14.14 and 
14.15 accomplish the modular multiplication x”. 

The situation is a little more complicated for 7 = 1. We only discuss the 
situation when k, = 1, because y will not change when k, = 0. Different 
strategies are needed for x = 7 and x = 11. When z = 11, because 11? = 
121 =15x8+1, yx 11? =y (mod 15). We need to do nothing and the same 
result holds for the third qubit k2. When x = 7, we can design the circuit by 
first investigating the following identity. 


y:T=y-4 mod 15 
yo + 2y1 + 4y2 + 8y3)-4 mod 15 
4yo + 8y1 + 16y2 + 32y3) mod 15 
yo + 2y3 + 4yo + 8y1) mod 15 
29 + y3- 2 + yo 2? +y,-23) mod 15. 


= 
=k 
ml 
= (Ye 


It shows that the modular multiplication can be achieved by exchanging the 
first qubit yo with the third qubit yz, and the second qubit y; with the 
fourth qubit ys. In Figure 14.14, gates C, D, and E are used to accomplish 
the former, and gates F, G, and H the latter. Further simplification of the 
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Fig. 14.14 The quantum circuit for the (hard) case for the realization of Shor’s algorithm 
(a = 7). From top to bottom, the qubits are ke, ki, ko, y3, y2, y1, and yo, respectively, in 
sequential order. 
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Fig. 14.15 The quantum circuit for the (easy) case for realization of Shor’s algorithm 
(a = 11). From top to bottom, the qubits are ko, k1, ko, y3, y2, y1, and yo, respectively, 
in sequential order. 


circuit can be made. Because the control bit y3 is |0) before gate C, that gate 
can just be omitted. Gates H and E have no effect on the period; they can 
be omitted, too. 

The circuit design for the quantum Fourier transform is just a standard 
design; see, for example, [30, Figure 5]. It has three Hadamard gates and three 
controlled-phase gates. Figures 14.14 and 14.15 show the circuit designs for 
y = 7 and y = 11. About 300 pulses are used in the experiment in total and 
it takes about 700 ms to accomplish all steps in the case of x = 7. 


14.4.3 Experimental Result 


Readout of the experiment needs a careful interpretation of the data. Because 
an NMR sample consists of many molecules, the readout is the average value 
of u from all molecules instead of the reading from a single molecule. 
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Both qubits ko and k, are found to be in state |0) after the extraction of 
the spectra [111] for the easy case of « = 11, whereas qubit kz is in an equal 
mixture state of |0) and |1). Thus the possible u can be 0 and 4, that is, 000 
and 100 in binary form. From (14.68), r can be obtained as r = 8/4 = 2, and 
the greatest common divisors are computed as gcd(11?/? + 1,15) = 3 and 
gcd(11?/2 — 1,15) =5. 

In the case of x = 7, the spectra in [111] indicate that only qubit ko is 
in state |0), and both qubits k, and kg are in an equal mixture of states 
|0) and |1). Thus w is in a mixture of states |0), |2), |4), and |6). We can 
see that the period of u is 2, thus the period of the modular exponent r is 
8/2 = 4. The factors of 15 can finally be obtained as ged(74/? — 1,15) = 3 
and ged(74/? 4+ 1,15) =5. 


14.5 Quantum Algorithm for Lattice-Gas Systems 


In the previous sections, we have explained how to construct a quantum com- 
puter using liquid NMR. and illustrated a successful experiment. We have 
taken it for granted that the coherence can be maintained long enough and 
different qubits can be entangled even if they are separated far apart in space. 
Unfortunately, these assumptions are not always practical and in fact they 
constitute great obstacles to overcome. The problem becomes more serious 
when more qubits are involved. Type-II quantum computers are proposed 
to alleviate this problem. A Type-II quantum computer is composed of a 
network or array of small quantum computers interconnected by classical 
communication channels [120]. Instead of the global coherence and entangle- 
ment, only local coherence and entanglement within every small quantum 
computer, called a node, are required, and the difficulty faced by the central- 
ized quantum computer is dramatically eased. 

The wave function of the whole Type-II quantum computer system is a 
tensor product over all the nodes: 


I(t) = |v(@1,t))@ + @lvlen,4)), (14.69) 


where N is the number of nodes. The lattice-gas algorithm (LGA) is specially 
suited for this structure. Every computation cycle can be broken up into three 
steps with two intermediate states |w’) and |w”’): 


1’) =C |d@)), 
WwW") =P ly), (14.70) 
Wet) =T|o"), 


where C is a unitary operator acting locally on every node, I" is a projec- 
tion operator, such as a measurement, and T is the streaming operator that 
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exchanges information among nodes. The Type-II quantum computer takes 
advantage of parallelism in two ways: one classical, where all the nodes work 
simultaneously; the other quantum, where quantum entanglement is still kept 
inside every node. Because measurement is applied and the system is reset 
at the end of every computation cycle, the coherence only needs to be main- 
tained for a short time. 


14.5.1 Quantum Algorithm for a Lattice-Gas Model 


Consider a one-dimensional diffusion equation without boundary condition 


dp Op 


3: = Daz? (14.71) 


where p is the mass density or temperature function along the z-axis. Using 
the finite difference method, we can write a finite difference approximation 
to solve the above partial differential equation numerically: 

p(z,t +7) — p(z, t) p(« + 1,t) — 2p(a, t) + p(x — 1, t) 


= = 7 (14.72) 


where 7 is the time step size and /| is the space step size. From physics, we 
know that the above equation may be studied by a lattice-gas algorithm. 
Without loss of generality, we assume that 7 and / are normalized so that the 
difference equation can be written as 


pai, +1) ~ plaisk) = 5(olis1,k) — 2o(ai,k) + pis) 


Points 7; are evenly distributed along the x-axis, also called nodes. To study 
the above equation, two functions, f1(a;,k) and fo(x;,k), called channels, are 
defined for each node x; at time k. The set of values of these two functions 
is called the state of node x;. Any physical observable, such as the density 
function p(x;,k), is a function of the state at the node. The evolution of the 
lattice, or the state of all nodes, consists of two operations: collision and 
propagation. A collision is a local operator only defined by the state of the 
node itself. The propagation operator transfers information from one node 
to another and the state at one node changes according to the state of other 
nodes. This is completed by defining a velocity vector for every channel which 
gives the information flow a direction. In our special example here, informa- 
tion in the two channels flows in opposite directions. After propagation, one 
channel gets its new value from its left neighbor, and the other from its right 
neighbor. This LGA is completed with a Type-II quantum computer by J. 
Yepez of the Air Force Research Laboratory and M.A. Pravia et al. of the De- 
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partment of Nuclear Engineering at MIT [8, 93, 119, 120]. The actual result 
is not as good as desired, but improvement is still possible. 

To store a floating-point number, a classical computer uses a register with 
32 or 64 bits, depending on the machine. Quantum computers presently have 
difficulty in doing it the same way as classical computers because there is not 
yet the technology for 32 or 64 qubits. In this quantum lattice-gas algorithm, 
a two-qubit system is proposed for every node. The two qubits are represented 
by |ai(@;,&)) and |qo(a;,&)), respectively, and 


lan ( xj,k = V fil Xi, k )|0) aa a emt xi,k )I1), 
g2( xj,k =V fo( Xj,k )|0) + V 1 — fol xi,k )|1). 


The state of the whole system |w(x;,k)) at node x; and time & is a tensor 
product: 


[w(ai, k)) = lq (ai, B))|q2(ai, k)) 

= Vfi (ai, k) fo(wi, k)|00) + (0 — fii, &)) foi, k)|10) 
+ J fil@i, kb) — fa(ai, k))|01) 
+ J = filwi,k)) = fo(wi, k))|11). 


Quantities fi(a;,k) and fo(x;,k) are the probabilities of occurrence of the 
state |0) for qubit 1 and 2, respectively, corresponding to the two chan- 
nels, and 1 — f,2(a;,k) are the occurrence probabilities of the state |1). 
Because the states are normalized, 0 < fi,2(ai,k) < 1, and we let p(x;,k) = 
fil(ai,k) + fo(ai, k). It is noted that our Type-II quantum computer assigns 
p a continuous value (a function of the occurrence probabilities) rather than 
a discrete value as a digital computer does. An array of two-qubit systems is 
used in the computation, corresponding to a series of nodes. 

The quantum LGA here has three steps in every cycle that complete a 
step of the finite difference algorithm computation: collision, measurement, 
and reinttialization. The last two composed are equal to one propagation 
operation in a normal lattice-gas algorithm. Because the propagation needs 
information exchange among different nodes, measurement and classical com- 
munication are needed to accomplish one operation. We map the quantum 
state to a vector in C+ as that given in (14.29). 

In the collision step, a unitary operator is applied simultaneously to all 
nodes: 


(xi, k)) = U|p(ai,k)), 


where 


oO 


gS 
I 


(14.73) 


NR le 
ot 
Ns. Nie. 
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The new occurrence probabilities of the state |0) of the two qubits after 
the operation, f; and f,, can be computed using 


1000 


> os 0100 
Le (o|ni |), n= 0000]? 


0000 


1000 


= > os 0000 
fo = (W|nely), ng = 0010]? 


0000 


Sl 


(14.74) 


leading to _ 
Fii(ai,k) = $(filai, k) + fo(2i,k)), 
fo(zi,k) = 5(fi(wi, k) + fo(ai, k)). 
The collision operator is actually doing a job of averaging. The state after 
the collision is also called the local equilibrium. _ 
In the second step, a measurement is applied at every node and f, 9(2;, k) 
of all the nodes are retrieved for future use. 
In the third step, using information from the measurement from the pre- 
vious step, the states of all the nodes are reinitialized to a separable state 


lar(wi,k +1) = / Fi (ais, &)|0) + 1 — fi (igs, A)I1), 


|qo(wi,k +.1)) = \/ fo(ai—s, k)|0) + 4/1 — fo(ai-1, &)|1). 


(14.75) 


It can be seen that the second and third steps have accomplished the propa- 
gation operation. At node x;, the new state of channel one is acquired from 
the same channel of its right neighbor node x;41, and channel two acquires 
its state from its left neighbor. It is complicated here only because the com- 
munication between two quantum systems is difficult. 

To see how this LGA works, let us begin from a local equilibrium state, 
fi(ai,k) = fe(ai,k) = p(a;,k)/2, where the states come from a collision 
operation (step 2). We list the f,2 around position x; before the third step 
in two rows 


Gee 3 P(@i-2,k) pl@i-a,k) = p(wi,k) — p(@i41,k) 

1: 2 2 2 2 

a p(ti-2,k)  p(wi-a,k) — p(wi,k) — p(@i41,k) 
2° 2 2 2 2 


and after the third step 
are e(zi-1,k) = p(wisk) = p(witn,k) — p(vi42,k) 
1: 2 2 2 2 


pie wae p(%i-3,k)  p(xi-2,k) — p(wi-1,k) — p(ai,k) 
2° 2 2 2 
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We can see that the row of f; (channel one) moves left and the row of fo 
(channel two) moves right. According to our definition, the new value of p is 
the sum of f|(a;,k+1) and f2(x;,k+1); that is, p(aj,k+1) = $(p(ai41,k)+ 
p(xi-1,k)). It is easy to check that 


p(vi,k +1) — p(ai,k) = 5 (oles, k) — 2p(xi,k) + p(xi-1,k)), 


as desired. 

Applications of the Type-II quantum computer with quantum LGA also 
have been reported in the simulation of the time-dependent evolution of a 
many-body quantum-mechanical system [121], solution of a one-dimensional 
magnetohydrodynamic turbulence [105], representation of solitons [104], and 
other equations. 


14.5.2 Physical Realization and Result 


The experiment in Section 14.5.1 uses a two-qubit molecule, chloroform, 
whose structure is shown in Figure 14.6. The hydrogen and carbon nuclei 
serve as qubits 1 and 2, respectively. 

The actual results obtained from the experiment are compared with simu- 
lation results [93]. After 12 steps, the error becomes very large. Imperfection 
in the decoupling sequences is blamed and it is believed that the problem can 
be mitigated when the technology is improved in the future. Extreme require- 
ment of high accuracy in the control pulse and readout is a disadvantage of 
this Type-II quantum computer, because it uses a continuous representation 
(the probability of occurrence) instead of a discrete one. Thus, it is more 
vulnerable to the inaccuracy in the NMR operation. Small errors in every 
step accumulate and finally become intolerable. Repeated measurement and 
reinitialization ease the requirement for coherence time, but place a high 
requirement on the fidelity at the same time. 


14.6 Conclusion 


In this chapter, we present the basic technology used to construct a quantum 
computer with liquid-state NMR. The successful experiments for many algo- 
rithms have shown that liquid-state NMR is capable of simulating a quantum 
computer and forms a testbed for quantum algorithms. It is so far the only 
technology available to realize a seven-qubit algorithm in the laboratory. One 
reason for its success is the robustness of the spin system which only interacts 
with the external magnetic field, and it is possible to maintain the coherence 
for a long time (from seconds to hours). Besides, over the 60-year history 
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of NMR spectroscopy, analytic tools have been developed for the purpose of 
chemical and medical applications, and exact description and dedicated co- 
herence control of the dynamics of the quantum spin system are now available 
to achieve high accuracy in the pulse design and application. In fact, the ex- 
perimental techniques established in NMR, especially the coherence control 
technology, can be easily transferred to other quantum systems if they have 
a similar Hamiltonian, and the research in NMR is therefore helpful for the 
development of other more complicated and powerful quantum computers. 

Liquid NMR has played a pioneering role in quantum computer technol- 
ogy development. But its lack of scalability has constituted a severe obstacle 
to its future applicability. However, in Section 14.3 we show new technology 
of solid-state NMR that has the potential to overcome liquid NMR’s diffi- 
culties. For solid-state NMR, under low temperature, the relaxation times 
of spins are typically very long, and the coupling between qubits is strong 
so that the control can be fast and easy. The small ratio of the gate time 
and the decoherence time makes more gates available, and more complicated 
algorithms can be tested. The nuclei can be cooled down easily and the spin 
system is highly polarized. The signal is much stronger so that fewer nuclei 
are needed. Even without the help of the gradient field and silicon technology, 
as we have mentioned, a quantum computer with 30 to 40 qubits is envisioned 
with designed molecules similar to that of the liquid-state NMR computer 
except that the ensemble is in a solid crystal state. This is already a quantum 
system that reaches the limit a classical computer can simulate. Although it 
is still not scalable and not a standard quantum computer, these small- and 
medium-scale quantum computers will help in the building of a scalable and 
working quantum computer. 


Acknowledgment We wish to thank the reviewer for constructive comments that have 
improved the quality of this chapter. 


Appendix 
The Homeomorphism from SU(2) to SO(3) 


We use the same notation as in Section 14.2.3. Recall that U is a complex 
matrix in SU(2), and we want to find a mapping R from SU(2) to the 
space of 3 x 3 real matrices, so that R(U) and U represent the same physical 
operation. Let v be a Bloch vector corresponding to the old state |¢) and v’ 
to the new one. Then the following identity must be satisfied, 


v’ = R(U)v, (14.76) 
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where U € SU(2). Using the definition of Bloch vector and density matrix, 
we obtain 


v’- A =U|h)(y|Ut — Io 


neat (14.77) 


where A = [I,, I,, I,] and we use the dot to denote the inner product between 
vectors. 
The product of J; and o; satisfies 


Tr(Iio;) a? Oijs a9 € 12 2\s 


where o,; are the Pauli matrices. By multiplying both sides of (14.77) with 
o, and taking the trace of both sides, we obtain 


U, = Tr(v’ - Aog) 


(14.78) 
= Tr(o, U I, Ul )u;, 
where vj, and v; are, respectively, the kth and ith entry of the corresponding 
vectors. We apply the summation convention in this section, where summing 
over repeated indices is implied unless otherwise stated. Comparing the above 
result with equation 

UE => RU) Kivi, 


we obtain the desirable matrix R(U): 
R(U) gi = Tr(og U I, U"). (14.79) 


Thus we have constructed a mapping from SU(2) to the set of 3 x 3 matrices. 
Let us now tentatively accept the fact that the target matrix is real which we 
prove later. We first show that R(U) is in fact a proper rotation matrix, that 
is, R(U) € SO(3), by proving that it is isometric and preserves the threefold 
product (cf. (14.82)). 

Let v = [a b cl? € R®. First note that we can find its norm by computing 
the determinant of a special matrix: 


c a-—bi 
a+bi —-ce 
= —4(c? +a? +B?) 


= -IvIP. 


det(v- A) = 4 


(14.80) 


Together with (14.77), we see that the transformation is isometric: 
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|v’ ||? = —4 det(v’ - A) 
= —4det(U v- AU) 
= —Adet(v- A) 


= |Iv|I?. 


(14.81) 


Direct companion also shows the preservation of the threefold product, as 
follows. Let v’, 1 = 1,2, 3, be three vectors on the Bloch sphere, v’! = R(u)v!. 
We have 


Tr ((v' - A)(v?- A)(v?- A)) = Tr(vp vj vphlTy) 
105 UR Eijk (14.82) 


vi - (v? x v3). 


VU 


The above identity can be used to show the preservation of the threefold 
product: 


Vly ey) = 4 ((v/!- A)(v '2. A)(v’3- A)) 
r ((Uv! - AUT)(Uv? - AUT)(Uv3 - AUT)) 
r (vt A)(v? -.A)(v?- A)) 


‘(v? x v*), 


(14.83) 


lI 
Sh ohh pk 
Hes 


I 
< 


and we see that R(U) € SO(3). To prove that the mapping is surjective, we 
introduce a parameterization of SU(2) in some exponential form. For any 
U € SU(2), we can find a 6 € [0, 27) and a unit vector n such that 


U(6,n) _ ce 1(9/2)n-6 


cos g — ing sin g —sin 5 S (ng + in) 


_ (14.84) 
sin &(m —in ) cos £ + ingsin 8 
gree 1 2 SD 
= cos 27 — isin 2n- 
= cos 5] —isin5n-o, 
where 0 = [dz,0y,0z]. Every combination of @ and n also corresponds to 


a complex matrix in SU(2). By using (14.8) and the equality U(@,n)~! = 
U(6,—n), the element of R(U) can be computed as 
RU); = Tr(o; Ul; Ut) 
= Tr (0; (cos at —isin on - o)1; (cos at + isin on -o)) 


0 0 


14.85 
= Tr(a, 1; cos? g + iojIjn-o sin 5 cos 5 ( ) 


— ion: al; sin $ cos § + ojn- oljn- osin? a), 


We divide the trace into four parts and compute them separately: 
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Tr(o;J;) = ois; Tr(a, [jn . a) = LEigkNks Tr(oin $ oI;) = Lik j Nk; (14.86) 
Tr(ojn- oljn-o) = Tr(ooxnLjoingn) 


= s Tr ((OinL + 1€ikmOm) (Ojil + 1€jinOn)NeN1) 


= OinOjINKNU — CikmEjlmNRN 
= Oij Nay + (2 _ bij Nan; = Oi i aQnjin; = big: (14.87) 


Substituting (14.87) and (14.86) back into (14.85), we obtain 


29 +20 
R(U)i3 = cos* 56i; + sin® $(2ninj; — 4:3) 
— exjnng sin $ cos § + ejnjn;sin § cos $ (14.88) 


= cos 46;; + (1 — cos 0@)njyn; + sin 8 e452, 


showing that R(U),; is real. We also claim that R(U) is a rotation about the 
axis n with angle @ in the three-dimensional Euclidean space by comparing 
it with the standard formula of a rotation matrix. Because every matrix in 
SO(3) can be regarded as a rotation about a certain axis with a certain 
angle, that matrix is now shown to be an image of some U € SU(2). Thus 
the mapping is surjective. 

Finally, we want to show that this mapping from SU(2) to SO(3) is a ho- 
momorphism and to investigate the multiplicity. Let U and T be two matrices 
in SU(2), v be an arbitrary vector on the Bloch sphere, and v’ = R(U)v, 
v” = R(T)v’ = R(T)R(U)v. Using (14.77) repeatedly, we obtain 


v’-A=T(v'- A)T! =TU(v- A)U'T = R(TU)v- A, 


thus v’” = R(TU)v. Because v is arbitrary, R(VU) = R(V)R(U), implying 
that the mapping is a homomorphism. 

For the multiplicity of this mapping, we investigate the kernel of R, 
ker(R), which is an invariant subgroup of SU(2), and the quotient group 
SU(2)/ker(R) will then be isomorphic to SO(3). Suppose that U € SU(2) 
and R(U) = Is, the identity matrix in SO(3). Then for any v on the Bloch 
sphere or R®, we have equality v- A = U(v- A)Ut, because v = Iv. Multi- 
plying both sides with U from the right and using (14.84), we have 


6 _. 8 6 st 
v- A(cos 5 —isin 5n-o) = (cos 5J —isin5n-o)v- A. 


Subtract from both sides the term v - A cos (0/2) J, 
gi 0 
(v- A) (n-o)sin 5 =(n-oa) (v- A)sin 3 
Assuming that sin (0/2) 4 0, we can divide this factor from both sides to 


obtain 
(v- A) (n-o) =(n-A) (v-o). 
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Let v = [1,0,0]". Using properties of Pauli matrices, we obtain 
Nol, —_ n3ly = 0. 


The above is possible only when nz = n3 = 0. Trying other different v leads 
to ny = 0, too. This is a contradiction, because we know that n is a unit 
vector. Thus we need sin (@/2) = 0. In this case U = I or U = —I and it 
is easy to verify that these two are really mapped to the identity in SO(3). 
Now, we can conclude that the mapping we defined by (14.79) is a two- 
to-one homomorphism from SU(2) to SO(3) with kernel ker(R) = {J, —J}. 
The mapping is also surjective, so it defines an isomorphism from the quotient 
group SU(2)/ker(R) to SO(3). The two elements in the kernel, +/, are in 
fact the same transformation for quantum systems because only the relative 
phase matters for a quantum system. For any O € SO(3), the two elements in 
R~1(O), U and —U for some U € SU(2), represent the same transformation, 
too. Thus, nothing is lost if we employ SO(3) to represent the transformations 
of a one-spin quantum system. 
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